Home

Blog

Bake AO

About

Contact

Home

Blog

Bake AO

About

Contact

Blog

How I implemented 3D particles with no CPU code, no meshes, no instancing.

Dec 29, 2025

20 min

The plan

This article shows how to create particle effects using only fragment shaders.

Why? When you try to create particle system that runs in a single full-screen draw, you notice that such a constraint force you to figure out some unconventional rendering techniques. It improves shader programming skills and unlocks a new way of thinking about shader programming.

In this article I make easy to follow case study of implementing such particles.

By the end of this article, you will learn:

The origin of the idea.
Exploring the original ShaderToy shader that I will modify.
Implementing fragment-shader particles in Unity.
- Implementing grid-based instancing of simple shapes.
- Rendering the particles on a 3D plane.
- Using many moving planes to fake dense 3D particle fields.
Integrating custom particles with a ShaderToy shader.
Profiling the created particles and doing a few optimization iterations.

___

The origin of the idea

When browsing LinkedIn, I saw the shader made by Jaiden Ortiz. He created this Christmas diorama to practice SDF modelling, raymarching, lighting and simple animation.

https://www.shadertoy.com/view/wfKcDK

I really think that when you learn graphics programming, you should separate shader programming and coding with graphics APIs, so for me this shader is a textbook example of the learning process that happens when you explore what's possible with fragment shaders. I love it!

However, it has a small issue. The snow particles glitch and stop after a while. So I asked Jaiden if I could fix that.

So this blog article documents the whole process of fixing and optimizing the fragment shader particles!

___

Understanding ShaderToy rendering

The shader is implemented on ShaderToy, which is a platform for coding only fullscreen fragment shaders. For me it is the best platform to learn shader programming, because with this constraint, you need to think really creatively to implement anything.

This is how ShaderToy looks. On the left, you can see how your shader looks when rendered fullscreen. On the right, there is the source code. The void mainImage( out vec4 fragColor, in vec2 fragCoord ) is the entry point of the fragment shader and it is invoked for each pixel of the image.

___

Usual 3D rendering path in ShaderToy

Usually, to render 3D scenes in a fullscreen shader you need to:

Define a camera. Usually by defining a ray for each pixel and transforming it with a matrix.
Trace the scene. Here you will see ray tracing or ray marching.
Color the results. Results returned by tracing the scene are colored using some lighting algorithms.

___

Define a camera

In this shader, the camera is defined by a per-pixel ray and the matrix that translates the ray direction.

The ray is created using screen UV and the camera matrix is defined by ray origin and look-at position.

ro - ray origin
rd - ray direction
cam - camera matrix

___

Trace the scene and color the results

Then, after the ray direction and ray origin are created, the 3D scene is raymarched and colored in the rayMarch(ro, rd) function.

The rayMarch function traces the scene by using a classic SDF raymarching algorithm. The scene is defined by an SDF (signed distance field). It is a function that for each point in 3D space can return the distance to the closest surface. And the distance is signed, so if you are inside the object, the distance is negative.

Raymarching works by marching along the ray, sampling the field, and making small steps forward as long as the surface is not hit. This is a 2D visualization.

Source (interactive shader): https://www.shadertoy.com/view/4lyBDV

So here, the scene is modeled with some simple shapes, like spheres, cones, and cubes. And the scene is colored depending on which object was hit.

___

Implementing the particles

Now that I understand how the original shader works, I can create better snow particles. The original particles had a bug where they glitched after a while. They also had some visual artifacts.

Here I will create a new particle system from scratch that avoids the issues. I will decouple particles from raymarching and use non-raymarching technique to avoid artifacts.

Let's implement the nice-looking particles first, without the 3D scene, and then I will think about how to integrate them with the shader.

I will implement this in HLSL in Unity, just to be able to profile the particles and iterate on optimization. After the particles are done, I will port them into GLSL and ShaderToy.

___

Initial setup

I'm using Unity 6000.2.9f1 with the URP render pipeline. I created this component to render a fullscreen transparent draw (just two triangles, using a material):

This script hooks into Unity's render pipeline. The RenderPipelineManager.beginCameraRendering event fires before each camera renders. We use it to draw our fullscreen effect. The Graphics.RenderPrimitives call renders 6 vertices as triangles, which makes 2 triangles that cover the screen.

[ExecuteAlways]
public class ProceduralRenderer : MonoBehaviour
{
    public Material material;

    private void OnEnable()
    {
        // Register to the RenderPipelineManager
        RenderPipelineManager.beginCameraRendering += RenderPipelineManager_beginCameraRendering;
    }

    private void OnDisable()
    {
        // Unregister from the RenderPipelineManager
        RenderPipelineManager.beginCameraRendering -= RenderPipelineManager_beginCameraRendering;
    }

    private void RenderPipelineManager_beginCameraRendering(ScriptableRenderContext context, Camera camera)
    {
        bool isGoodCamera = camera.cameraType == CameraType.Game;
        isGoodCamera |= camera.cameraType == CameraType.SceneView;

        if (!isGoodCamera || material == null)
            return;

        RenderParams renderParams = new RenderParams()
        {
            camera = camera,
            material = material,
            worldBounds = new Bounds(Vector3.zero, Vector3.one * 10000.0f),
            // ... Set other properties here
        };

		// Render!
        Graphics.RenderPrimitives(renderParams, MeshTopology.Triangles, 6, 1); // Render 2 triangles with the material

Then I created a shader that draws a fullscreen rectangle:

Shader "ProceduralPixels/FragmentShaderParticles-Blog"
{
    SubShader
    {
        Tags
        {
            "RenderType" = "Transparent"
            "IgnoreProjector" = "True"
            "RenderPipeline" = "UniversalPipeline"
        }

        Pass
        {
            Name "ForwardLit"
            Tags
            {
                "LightMode" = "UniversalForwardOnly"
            }

            // No depth testing
            Cull Off
            ZWrite Off
            ZTest Off

            // Traditional alpha blend
            Blend SrcAlpha OneMinusSrcAlpha

            HLSLPROGRAM
            #pragma vertex vert
            #pragma fragment frag

            #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Core.hlsl"

            // Fullscreen quad from 6 vertices (2 triangles)
            // These UVs form two triangles: (0,0)-(0,1)-(1,1) and (0,0)-(1,1)-(1,0)
            static float2 quadUVs[] = 
            {
                float2(0.0f, 0.0f), float2(0.0f, 1.0f), float2(1.0f, 1.0f),
                float2(0.0f, 0.0f), float2(1.0f, 1.0f), float2(1.0f, 0.0f)
            };

            struct FragmentData
            {
                float4 positionCS : SV_Position;
                float2 uv : TEXCOORD0;
            };


            FragmentData vert(uint vertexID : SV_VertexID)
            {
                FragmentData output;

                output.positionCS = float4(quadUVs[vertexID] * 2.0 - 1.0, 0.5, 1.0);
                output.uv = quadUVs[vertexID];

                #if UNITY_UV_STARTS_AT_TOP
                    output.uv.y = 1.0 - output.uv.y;
                #endif

                return output;
            }


            float4 frag(FragmentData input) : SV_Target0
            { 
	            // Draw UV colors.
                return float4(input.uv.xy, 0.0, 1.0);
            }

            ENDHLSL

This is how it looks in Unity:

___

Drawing particles

Let's draw some particles in fullscreen first. From now on, I will be modifying the fragment shader only.

I will create a UV that is screen aspect-ratio agnostic:

float2 uv = input.uv.xy;
uv.x *= _ScreenParams.x / _ScreenParams.y;
return float4(pow(uv.xy, 2.0), 0.0, 1.0

Then, I will use a fractional part to divide the space into a grid.

Using a fractional part splits continuous UV space into repeating 0-1 cells, while integer part represents the ID of the created cell.

uv *= 4.0;
float2 cellID = floor(uv);
float2 cellUV = uv - floor(uv); // Contains the UV of each cell
return float4(cellUV.xy, 0.0, 1.0

Now, using the cellUV, I will draw a disk at the center of each grid cell:

The DrawDisk function uses smoothstep to create a smooth falloff. It fades from 1.0 to 0.0 as the distance goes from diskRadius * 0.5 to diskRadius. This creates a soft edge on the particle.

float DrawDisk(float2 uv, float2 diskCenter, float diskRadius)
{
	float distanceToCenter = distance(uv, diskCenter);
	return smoothstep(diskRadius, diskRadius * 0.5, distanceToCenter);
}

float4 frag(FragmentData input) : SV_Target0
{
	...
	float disk = DrawDisk(cellUV, float2(0.5, 0.5), 0.1);
	return float4((float3)disk, 1.0) + float4(cellUV.xy, 0.0, 1.0) * 0.1

Now, I will animate the position of each point using sine waves:

float2 diskCenter = cos(_Time.yy * float2(2.0, 5.3)) * 0.5 + 0.5;
float disk = DrawDisk(cellUV, diskCenter, 0.1

The particles now move in circles. All particles move in sync because they use the same time value.

Now, each point in each grid cell moves in the same way. So I will use a hash function on cell ID to randomize it:

The FastHash2_2 function takes a 2D input and returns a random-looking 2D output. This is a deterministic hash that always returns the same output for the same input. I'm using a simple integer-based hash function (implementation not shown, but any 2D hash will work).

float2 cellIDHash = FastHash2_2(cellID);
float2 diskCenter = cos(_Time.yy * cellIDHash * 5.0) * 0.5 + 0.5; // Randomize the speed in each axis

Now each particle moves at a different speed and direction, creating natural variation.

Now I need to fix the issue with the boundaries and keep each disk within its cell bounds:

float2 cellIDHash = FastHash2_2(cellID);
float2 diskCenter = cos(_Time.yy * cellIDHash * 5.0) * 0.5 + 0.5;
	diskCenter = lerp(DiskRadius, 1.0 - DiskRadius, diskCenter); // Keep disk inside the cell
float disk = DrawDisk(cellUV, diskCenter, DiskRadius

And now, let's make the UV scroll.

// Just before the cells are calculated
uv.y += _Time.y * 1.5

Looks like snow to me!

And this is the full shader code till now:

float DrawDisk(float2 uv, float2 diskCenter, float diskRadius)
{
	float distanceToCenter = distance(uv, diskCenter);
	return smoothstep(diskRadius, diskRadius * 0.5, distanceToCenter);
}

static float DiskRadius = 0.1;

float4 frag(FragmentData input) : SV_Target0
{ 
	float2 uv = input.uv.xy;
	uv.x *= _ScreenParams.x / _ScreenParams.y;

	uv *= 4.0;
	uv.y += _Time.y * 1.5;
	
	float2 cellID = floor(uv);
	float2 cellUV = uv - floor(uv);

	float2 cellIDHash = FastHash2_2(cellID);
	float2 diskCenter = cos(_Time.yy * cellIDHash * 2.0) * 0.5 + 0.5;
	diskCenter = lerp(DiskRadius, 1.0 - DiskRadius, diskCenter);
	float disk = DrawDisk(cellUV, diskCenter, DiskRadius);

	return float4((float3)disk, 1.0

___

Making it render on a 3D plane

Now I have nice snow particles that work in 2D, but I need to make them work in 3D.

Why do we need 3D? The ShaderToy shader renders a 3D scene using raymarching. To integrate our particles with that scene, they need to exist in the same 3D space. Otherwise, they would just be a flat overlay that doesn't interact with the 3D objects.

In a ShaderToy shader, the rendering uses raymarching and each pixel has a ray that goes from the camera towards the scene. I will recreate this setup in Unity and I will render those particles on a 3D plane.

___

Ray origin and direction

To create the ray origin and ray direction of the camera for each pixel, I will:

Modify the vertex shader to include a world space position.
Use the world space position and camera position in the fragment shader to reconstruct the ray origin and ray direction.

So, let's start by modifying the interpolators:

struct FragmentData
{
	float4 positionCS : SV_Position;
	float2 uv : TEXCOORD0;
	float4 positionWS : TEXCOORD1; // Added world space position

But how can I get the world space position in a fullscreen draw? Well, I have a clip space position that creates a fullscreen triangle, so I can use an inverse view-projection matrix to reconstruct the world space position. According to the image below, I will move from NDC into a world space.

FragmentData vert(uint vertexID : SV_VertexID)
{
	FragmentData output;

	output.positionCS = float4(quadUVs[vertexID] * 2.0 - 1.0, 0.5, 1.0);
	output.uv = quadUVs[vertexID];
	
	// Reconstruct world space position from the clip space
	output.positionWS = mul(UNITY_MATRIX_I_VP, output.positionCS);
	output.positionWS /= output.positionWS.w

And then I can use it in the fragment shader to recreate the ray direction and ray origin:

float4 frag(FragmentData input) : SV_Target0
{ 
	float3 rayOriginWS = _WorldSpaceCameraPos.xyz;
	float3 rayDirectionWS = normalize(input.positionWS.xyz - rayOriginWS.xyz);

	// Debug ray direction as a color
	return float4((rayDirectionWS), 1.0

It looks like it works correctly, because the ray direction colors reflect the colors of the world space coordinate handles:

___

Raycasting a plane

Now I will use raycasting to hit a plane in 3D. I will create a function based on this infographic:

This is the function that does the plane raycast.

The math here is based on the plane equation. A plane is defined by a normal vector and a distance from the origin (offset). The function calculates how far along the ray we need to travel to hit the plane. It returns the distance t, where hitPosition = rayOrigin + rayDirection * t.

float RaycastPlane(float3 rayOrigin, float3 rayDirection, float3 planeNormal, float offset)
{
	// Returns the distance to the hit
	return (offset - dot(planeNormal, rayOrigin)) / dot(planeNormal, rayDirection

And I used it to raycast the Z plane in world space:

float4 frag(FragmentData input) : SV_Target0
{ 
	float3 rayOriginWS = _WorldSpaceCameraPos.xyz;
	float3 rayDirectionWS = normalize(input.positionWS.xyz - rayOriginWS.xyz);

	// Raycast the plane
	float hitT = RaycastPlane(rayOriginWS.xyz, rayDirectionWS.xyz, float3(0.0, 0.0, 1.0), 0.0);
	
	// Calculate hit position
	float3 hitPositionWS = rayOriginWS.xyz + rayDirectionWS.xyz * hitT;

	// Calculate the UV from the plane. It is the XY plane so I can just use that.
	float2 uv = hitPositionWS.xy;
	uv.y += _Time.y * 1.5;

	float2 cellID = floor(uv);
	float2 cellUV = uv - floor(uv

From the side, it looks like this:

___

Making it volumetric

Now, all the particles fall on a flat surface. To make it 3D, I will create multiple planes with the particles, each plane will be positioned using a different offsets.

I will create a for loop in the shader code where I will iterate a few times to render many layers of those particles:

// Added constants for controlling layers count and distance between them.
const static float LayersCount = 10.0;
const static float DistanceBetweenLayers = 1.0;

float4 frag(FragmentData input) : SV_Target0
{ 
	float3 rayOriginWS = _WorldSpaceCameraPos.xyz;
	float3 rayDirectionWS = normalize(input.positionWS.xyz - rayOriginWS.xyz);

	float sum = 0.0;

	// Iterate few layers
	for (float i = 0.0; i < LayersCount; i++)
	{
		// Different plane offset for each layer
		float planeOffset = i * DistanceBetweenLayers;

		float hitT = RaycastPlane(rayOriginWS.xyz, rayDirectionWS.xyz, float3(0.0, 0.0, 1.0), planeOffset);
		if (hitT < 0.0)	// Skip particles that would be behind the camera
			continue;
		
		float3 hitPositionWS = rayOriginWS.xyz + rayDirectionWS.xyz * hitT;

		float2 uv = hitPositionWS.xy;
		uv.y += _Time.y * 1.5;

		float2 cellID = floor(uv);
		float2 cellUV = uv - floor(uv);

		float2 cellIDHash = FastHash2_2(cellID);
		float2 diskCenter = cos(_Time.yy * cellIDHash * 2.0) * 0.5 + 0.5;
		diskCenter = lerp(DiskRadius, 1.0 - DiskRadius, diskCenter);
		float disk = DrawDisk(cellUV, diskCenter, DiskRadius);

		// Accumulate the particles additively
		sum += disk;
	}

	return float4((float3)sum, 1.0

Now each layer looks the same, so I will include the layer ID in the cellIDHash calculation to make each layer unique:

float2 cellIDHash = FastHash2_2(cellID * 13.8974 + i * 7.39712

Nice! It is hard to notice any planes!

I have a problem that those layers are always in a fixed position, and I want to make them spawn always from the camera position. Look at what happens when I move the camera:

I will modify the loop to begin rendering from the plane that is closest to the camera. I will also snap all the planes to the distance between the layers.

This is a modified code:

float3 planeNormal = float3(0.0, 0.0, 1.0);

// Snap the first plane offset to the closest one to the camera
float startPlaneOffset = floor(dot(rayOriginWS.xyz, planeNormal) / DistanceBetweenLayers) * DistanceBetweenLayers;

for (float i = 0.0; i < LayersCount; i++)
{
	// Different plane offset for each layer
	float planeOffset = startPlaneOffset + i * DistanceBetweenLayers;

	float hitT = RaycastPlane(rayOriginWS.xyz, rayDirectionWS.xyz, planeNormal, planeOffset);
	if (hitT < 0.0)
		continue;

	float3 hitPositionWS = rayOriginWS.xyz + rayDirectionWS.xyz * hitT;

	float2 uv = hitPositionWS.xy;
	uv.y += _Time.y * 1.5;

	float2 cellID = floor(uv);
	float2 cellUV = uv - floor(uv);

	// I modified the cell hash to use planeOffset instead of i
	float2 cellIDHash = FastHash2_2(cellID * 13.8974 + planeOffset * 7.39712);
	float2 diskCenter = cos(_Time.yy * cellIDHash * 2.0) * 0.5 + 0.5;
	diskCenter = lerp(DiskRadius, 1.0 - DiskRadius, diskCenter);
	float disk = DrawDisk(cellUV, diskCenter, DiskRadius);

	sum += disk

Now I will make the furthest and closest particles smoothly fade away to avoid this popping effect in the distance.

for (float i = 0.0; i < LayersCount; i++)
{
	float planeOffset = startPlaneOffset + i * DistanceBetweenLayers;

	...

	// Get distance from the camera to the plane surface
	float distanceToPlane = planeOffset - dot(planeNormal, rayOriginWS);

	// Fade further particles
	float alpha = smoothstep((LayersCount - 1.0) * DistanceBetweenLayers, 0.0, distanceToPlane);

	// Fade closest particles
	alpha *= smoothstep(DistanceBetweenLayers * 0.5, DistanceBetweenLayers * 2.0, distanceToPlane);

	...
	
	// Accumulate the particles additively, use the calculated alpha
	sum += disk * alpha

Sweet!

___

Making it work with any rotation

Currently, I implemented it using XY planes. However, it would be nice to control the rotation of those particles to make them more aligned with specific camera angles. The original shader has a camera that is tilted a little, and it would be nice if the particles could face the camera.

To make this happen, I will use a rotation matrix. So the rendering will stay the same. I will just rotate the rayOrigin and rayDirection before the calculations happen.

This is my helper function for creating a 3D rotation matrix from euler angles:

float3x3 Get3DRotationMatrix(float3 angles)
{
	float3x3 mat = float3x3(
		1.0, 0.0, 0.0, 
		0.0, 1.0, 0.0, 
		0.0, 0.0, 1.0
	);

	mat = mul(float3x3(
		cos(angles.z), sin(angles.z), 0.0, 
		-sin(angles.z), cos(angles.z), 0.0, 
		0.0, 0.0, 1.0
	), mat);

	mat = mul(float3x3(
		1.0, 0.0, 0.0,
		0.0, cos(angles.x), sin(angles.x),
		0.0, -sin(angles.x), cos(angles.x)
	), mat);

	mat = mul(float3x3(
		cos(angles.y), 0.0, sin(angles.y), 
		0.0, 1.0, 0.0, 
		-sin(angles.y), 0.0, cos(angles.y)
	), mat);

	return mat

I modified the code to use a rotation matrix on ray origin and ray direction. Since rayOrigin and rayDirection are no longer world space variables, I used a PS suffix for particle space.

Coordinate space naming convention:

WS = World Space (the global 3D coordinate system)
PS = Particle Space (a local coordinate system rotated for the particles)

float3 rayOriginWS = _WorldSpaceCameraPos.xyz;
float3 rayDirectionWS = normalize(input.positionWS.xyz - rayOriginWS.xyz);

// Creating some rotation matrix
float3x3 rotationMatrix = Get3DRotationMatrix(float3(0.9, 1.1, 0.0));

// Particle movement can be defined in world space now:
float3 movementOffsetWS = float3(0.0, 1.0, 0.0) * _Time.y * 1.5;

// Rotating ray origin and ray direction from world space to a particle space
float3 rayOriginPS = mul(rotationMatrix, rayOriginWS.xyz + movementOffsetWS); 
float3 rayDirectionPS = mul(rotationMatrix, rayDirectionWS.xyz); 

	// I changed the later code to use particle space (PS) instead of world space (WS)

Notice that the particles still fall down and the layers are smoothly scrolling.

___

Particle intersection with the scene

Now the particles work as a fullscreen effect. However, I need to figure out a way to blend them with the opaque scene.

I placed a big opaque cube on the scene and modified the shader to blend with the background.

sum = clamp(sum, 0.0, 1.0);
return float4(1.0, 1.0, 1.0, sum

All the particle planes are rendered in a single fragment shader, so to make the particles intersect with the opaque scene content, I need to implement the depth testing in the shader. Let's do this.

I included a camera depth texture with a point clamp sampler in the shader code:

Texture2D _CameraDepthTexture;
SamplerState pointClampSampler

Then I added this code at the beginning of the fragment shader to get the distance to the scene.
distanceToSceneWS will store the distance from the camera position to the opaque surface of the scene.

// Sample raw depth value from the depth texture
float rawScreenDepth = _CameraDepthTexture.SampleLevel(pointClampSampler, input.uv, 0.0f).r;

// Reconstruct the NDC position from screen UV and raw depth value.
float4 scenePositionCS = float4((input.uv.xy * 2.0 - 1.0), rawScreenDepth, 1.0); 

// Convert NDC to world space position
float4 scenePositionWS = mul(UNITY_MATRIX_I_VP, scenePositionCS); 
scenePositionWS /= scenePositionWS.w;

// Calculate distance from camera to the opaque scene position in world space
float distanceToSceneWS = distance(_WorldSpaceCameraPos.xyz, scenePositionWS);

// Optional visualization of a scene distance - grayscale from 0 to 20.
return float4((float3)smoothstep(0.0, 20.0, distanceToSceneWS), 1.0

Visualizing the distance to scene. White areas are far from the camera, black areas are close. The cube is clearly visible as a dark silhouette.

Then I integrated it into the particle logic. The variable hitT in the loop actually contains the distance from the camera position to the particle plane, so I can just check if this distance is larger than the distance to the scene:

	if (hitT < 0.0 || hitT > distanceToSceneWS) // Skip if behind camera or behind scene geometry
	continue

Looks good!

___

Extracting a particle rendering method

My goal was to implement the particles in the ShaderToy shader. Let's think about what is needed to fully render those particles.

Before integrating with ShaderToy, I need to organize the code. ShaderToy uses GLSL, and the particle logic is currently mixed with Unity-specific code. I will extract the particle rendering into a clean function that only needs basic inputs. This will make the port to GLSL much easier.

Looking at the source code:

I need a ray origin and ray direction.
I need a matrix to rotate the space.
I need a distance to the opaque scene.

Now, I will modify the source code and move the particle rendering logic into a separate function:

float GetFallingParticles(float3 rayOriginWS, float3 rayDirectionWS, float3x3 rotationMatrix, float distanceToSceneWS)
{
	float3 movementOffsetWS = ParticleMovementOffsetWS * _Time.y * 1.5;

	float3 rayOriginPS = mul(rotationMatrix, rayOriginWS.xyz + movementOffsetWS); 
	
	// All the particle rendering logic goes here (raycasting planes, drawing disks, etc.)
	// This includes: calculating plane intersections, generating cell UVs,
	// hashing cell IDs for randomness, and accumulating particle opacity
	...

	return sum;
}

// And this is the fragment shader now:
float4 frag(FragmentData input) : SV_Target0
{ 
	// Get distance to opaque scene
	float rawScreenDepth = _CameraDepthTexture.SampleLevel(pointClampSampler, input.uv, 0.0f).r;
	float4 scenePositionCS = float4((input.uv.xy * 2.0 - 1.0), rawScreenDepth, 1.0); 
	float4 scenePositionWS = mul(UNITY_MATRIX_I_VP, scenePositionCS); 
	scenePositionWS /= scenePositionWS.w;
	float distanceToSceneWS = distance(_WorldSpaceCameraPos.xyz, scenePositionWS);

	// Get ray origin and direction
	float3 rayOriginWS = _WorldSpaceCameraPos.xyz;
	float3 rayDirectionWS = normalize(input.positionWS.xyz - rayOriginWS.xyz);

	// Get some rotation matrix
	float3x3 rotationMatrix = Get3DRotationMatrix(float3(0.9, 1.1, 0.0));

	// Render particles
	float particles = GetFallingParticles(rayOriginWS, rayDirectionWS, rotationMatrix, distanceToSceneWS);

	return float4(1.0, 1.0, 1.0, particles

I did that because it will be easier to port just the GetFallingParticles function into GLSL in ShaderToy to integrate into other shaders.

___

Fixing a precision bug

Before moving forward, I need to fix a bug I discovered. When the scene is reloaded in Unity, the particles start to pop for some reason. It looks like the randomness for each layer is messed up.

My first bet is that there is some imprecision in the hash function input that randomizes each plane. It was using a plane offset that could be a little bit different depending on the camera position. So I modified the particles and their hash function to work on the plane ID instead of the plane offset.

// Calculated startPlaneOffsetID, and then startPlaneOffset
float startPlaneOffsetID = floor(dot(rayOriginPS.xyz, planeNormal) / DistanceBetweenLayers);
	float startPlaneOffset = startPlaneOffsetID * DistanceBetweenLayers; // Multiplying here and in the loop created imprecision

for (float i = 0.0; i < LayersCount; i++)
{
	...

	// Modified the hash function input to use plane ID instead of planeOffset
	float planeID = startPlaneOffsetID + i; // No multiplication. This solved the issue
	float2 cellIDHash = FastHash2_2(cellID * 13.8974 + (planeID) * 7.39712

And the issue is fixed:

___

Let's integrate the particles with the diorama

Finally it's time to move the code to GLSL and use it in ShaderToy!

Now that the particles work well in Unity, I need to port them back to ShaderToy to fix the original shader. This involves converting HLSL to GLSL and removing Unity-specific code.

___

Parsing the code from HLSL to GLSL

I will go an easy way. I will use GPT o1 to parse the particles source code from HLSL into GLSL:

I pasted the generated code at the top of the diorama shader in ShaderToy.

___

Removing original snow particles

I removed all the snow-related code from the original shader. Basically, I removed the getSnowflake function and then removed all the code that was causing compilation issues. These are all the code fragments:

And this is how it looks now:

___

Accessing all required variables for particle rendering

Now I need to access the ray direction, ray origin, camera rotation matrix, and distance to the scene. In the code, I have access to the first three here:

ro - ray origin
rd - ray direction
cam - camera rotation matrix

For the scene distance, I need to modify the rayMarch function. I will add one more output to this function:

// I added the out_sceneDistance output
vec3 rayMarch(vec3 ro, vec3 rd, out float out_sceneDistance)
{
	...
	
    for (int i = 0; i < STEPS; i++) 
    {
        vec3 p = ro + rd * t;     // Position along the ray
        vec2 result = map(p);     // Current distance to the scene

        float d = result.x;
        ...
        
        // t accumulates the marched distance each step
        t += d;                   // "March" the ray
        
        // So I added this line to output the scene distance
        out_sceneDistance = t

___

Integrating the particles

Now it's time to modify the mainImage function to integrate the particles with the shader:

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    vec2 uv = (fragCoord - .5 * iResolution.xy) / iResolution.y;

    vec3 ro = vec3(4, 4, -4);               // Ray origin
    vec3 target = vec3(0, 0, 0);            // Looking at the center of scene
    mat3 cam = camera(ro, target, 0.);
    vec3 rd = cam * normalize(vec3(uv, 1)); // Ray direction
    vec3 col = vec3(0);                     // Final pixel color
    
    // I modified the rayMarch to output the scene distance
    float sceneDistance = 9999.0;
    col = rayMarch(ro, rd, sceneDistance);
    
    // The cam matrix was a camera to world matrix, but I need world to camera to align the particles.
    // So I transposed the cam matrix to create an inverse transformation
    mat3 rotationMatrix = transpose(cam); 
    
    // Getting the particles using ray origin, ray direction, rotation matrix, and scene distance
    float particles = GetFallingParticles(ro, rd, rotationMatrix, sceneDistance);
    
    // Blending the particles with the background color
    col = mix(col, vec3(1.0), particles);
    
    fragColor = vec4(col, 1

Before:

The original shader with glitching snow particles.

Link to the original shader: https://www.shadertoy.com/view/wfKcDK

After:

The fixed shader with smooth, non-glitching snow particles that properly integrate with the 3D scene.

You can look at the modified shader here: https://www.shadertoy.com/view/WcGfDz

___

Optimization iterations

The particles work, but are they fast? Let's profile them and find out where the bottlenecks are.

I strongly believe that most of the features we implement can be easily optimized 2-3x out of the box just by opening the profiler, looking at the most important issue, and fixing it.

In this section, I will show 5 optimization iterations. Each one targets a specific bottleneck found in the profiler.

___

First profiling

I will profile this view in a Unity application build on RTX 3060 in FullHD resolution. 20 particle layers.

And those are the results. As expected, the shader is SM bound on the fragment shader:

Let's look deeper and see what instructions are the most problematic:

It looks like there are a few special instructions that stall the program, and the biggest issue is with the FP32 pipe, which does simple math operations. The shader also doesn't use texture fetches.

GPU terminology:

FP32 pipe: The part of the GPU that does 32-bit floating point math (add, multiply, etc.)
XU: handles special functions like sin, cos, sqrt
Texture fetches: Reading from textures, handled by dedicated texture units

Also, when looking at the shader DXIL disassembly, it contains one non-unrolled loop:

Currently I see two ways of optimizing this shader:

Loop unrolling. Could enable the compiler to do more optimizations.
Using texture samples for hash functions.

___

1st optimization iteration - loop unrolling

First of all, I added this unrolling to the particle rendering. This should strip away the loop logic from the shader and it will enable the compiler to do more optimizations. The shader was using 5-6% time on branching, so it should make it at least 5-6% faster:

[unroll] // Added the attribute here
for (float i = 0.0; i < LayersCount; i

And 0.04ms shaved (from 0.60ms to 0.56ms), as expected! Not much, but it's honest work.

___

2nd optimization iteration - using texture-based hash function

The next idea is to utilize the texturing units to speed up the hash functions. Currently the shader doesn't use textures for hash calculations, so those texture units are idle. So I used this small texture to sample a random value in the shader, instead of using an integer-based hash:

I added this function:

float2 TextureHash2_2(float2 uv)
{
	return _NoiseRGBA.SampleLevel(pointRepeatSampler, uv * 0.021, 0).rb

And added a UV offset for each particle layer:

float3 hitPositionPS = rayOriginPS.xyz + rayDirectionPS.xyz * hitT;

float2 uv = hitPositionPS.xy;
uv += planeID * 13.71; // Added this UV offset

float2 cellID = floor(uv);
float2 cellUV = uv - floor(uv);

float2 cellIDHash = TextureHash2_2(cellID + planeID * 13.489); // Modified this hash
float2 diskCenter = cos(_Time.yy * cellIDHash * 2.0) * 0.5 + 0.5

And maybe this wasn't the best idea. The shader is a little bit faster, but I expected more. When next iterations are performed, I may actually revert these changes.

The profiler shows only marginal improvement. The shader went from 0.56ms to about 0.50ms.

And it stays at FP32 bottleneck, now together with the texture fetches:

___

3rd optimization iteration - simplifying the shader code

Now I will focus on actual code simplification. I was able to make this code run 10% faster just by experimenting with textures and loop unrolling, which is nice, but I will not optimize anything further without modifying the shader itself.

The disk drawing uses a distance function, which involves a square root calculation. I could replace it by using a squared distance. So I replaced the DrawDisk function with this one:

float DrawDisk(float2 uv, float2 diskCenter, float diskRadius)
{
	float distanceToCenter = dot(uv - diskCenter, uv - diskCenter) - diskRadius * diskRadius;
	return smoothstep(0.0, -diskRadius * diskRadius, distanceToCenter

Currently, the loop raytraces each plane separately. I could simplify it and raytrace just the first and second plane, and then use the difference between the hits to calculate all other hits. I did that and also simplified some calculations from the loop (replaced smoothstep with a simple division). I did each optimization idea separately, and it was a lot of trial and error.

// Calculate first hit
float3 firstHitT = RaycastPlane(rayOriginPS.xyz, rayDirectionPS.xyz, planeNormal, startPlaneOffset);
float firstPlaneDistance = startPlaneOffset - dot(planeNormal, rayOriginPS);

// Calculate the second hit
float3 secondHitT = RaycastPlane(rayOriginPS.xyz, rayDirectionPS.xyz, planeNormal, startPlaneOffset + DistanceBetweenLayers);
float secondPlaneDistance = (startPlaneOffset + DistanceBetweenLayers) - dot(planeNormal, rayOriginPS);

// Calculate deltas
float3 hitTDelta = secondHitT - firstHitT;
float planeDistanceDelta = secondPlaneDistance - firstPlaneDistance;

// Distance to the furthest plane
float maxDistance = (LayersCount - 1.0) * DistanceBetweenLayers;

[unroll]
for (float i = 0.0; i < LayersCount; i++)
{
	float planeID = startPlaneOffsetID + i;
	
	// Use the delta to calculate hitT and distance to plane.
	// Removing the usage of the RaycastPlane simplified the math here.
	float hitT = firstHitT + hitTDelta * i; 
	float distanceToPlane = firstPlaneDistance + planeDistanceDelta * i;

	[flatten]
	if (hitT < 0.0 || hitT > distanceToSceneWS)
		continue

	// Replaced smoothstep with a different formula. For me, it is harder to understand, but it reduced math calculations
	float alpha = clamp(1.0 - distanceToPlane / maxDistance, 0.0, 1.0);
	alpha *= clamp(distanceToPlane / DistanceBetweenLayers * 2.0, 0.0, 1.0);

	float3 hitPositionPS = rayOriginPS.xyz + rayDirectionPS.xyz * hitT;

	float2 uv = hitPositionPS.xy;
	uv += planeID * 13.71;

	float2 cellID = floor(uv);
	float2 cellUV = uv - floor(uv);

	// Using a texture-based hash distributes some of the workload to the texturing units.
	float2 cellIDHash = TextureHash2_2(cellID + planeID * 11.489);
	float2 diskCenter = cos(_Time.yy * cellIDHash * 2.0) * 0.5 + 0.5;
	diskCenter = lerp(DiskRadius, 1.0 - DiskRadius, diskCenter);
	
	// Now it doesn't use square root
	float disk = DrawDisk(cellUV, diskCenter, DiskRadius); 

	sum += disk * alpha

Well, it didn't get much faster at the end. It went from 0.50ms to 0.45ms. But I'm slowly getting there.

However, the bottleneck shifted completely from the FP32 to the XU. And the shader is rarely stalled.

To optimize it further I would need to remove the floor() and cos() instructions and replace it with something more efficient. However, I was never able to create a more efficient way of calculating a cos approximation or floor. So I think that the shader logic is pushed to its limits.

___

4th optimization iteration - playing with parameters

I will just play with the parameters. There's not much to play with here. I can reduce the layers count, bump the distance between the layers, and increase the particle size.

It should make the particles look more or less the same on the screen.

Before:

After:

And it got faster. From 0.45ms to 0.35ms.

___

5th optimization iteration - baking the particle positions

To be honest, I was sure that at this point I would cut the time of this effect by half compared to the first profiling, but I'm not giving up. I need to shave just 0.05ms to achieve that! And I have an idea!

I could render particle positions into a texture in a separate render pass.

Currently, the particle position is calculated for each rendered pixel, while it could be prebaked into a texture and calculated once per cell.

My idea is to render a 64x64 texture where each pixel will contain the baked particle position. This would allow me to completely skip the particle position calculation in the particle shader code.

So I created a renderer feature that bakes this texture each frame. I used URP's render graph:

public override void RecordRenderGraph(RenderGraph renderGraph, ContextContainer frameData)
{
	var resources = frameData.GetOrCreate<ParticleBakingResourceData>();

	TextureDesc desc = new TextureDesc(64, 64)
	{
		name = "ParticleBakingTexture",
		colorFormat = GraphicsFormat.R16G16_UNorm, // Use only 2 channels and F16 precision for 0 to 1 values
		clearBuffer = true,
		clearColor = Color.clear,
		filterMode = FilterMode.Point,
		wrapMode = TextureWrapMode.Repeat
	};

	resources.bakedTexture = renderGraph.CreateTexture(desc);

	using (var builder = renderGraph.AddRasterRenderPass(nameof(ParticleBakingPass), out PassData passData))
	{
		builder.AllowPassCulling(false);
		builder.SetRenderAttachment(resources.bakedTexture, 0, AccessFlags.Write);
		builder.SetGlobalTextureAfterPass(resources.bakedTexture, Shader.PropertyToID("_BakedParticlesData"));

		passData.material = material;

		builder.SetRenderFunc((PassData data, RasterGraphContext context) =>
		{
			context.cmd.DrawProcedural(Matrix4x4.identity, data.material, 0, MeshTopology.Triangles, 6, 1

And I baked the particle positions using this shader:

const static float DiskRadius = 0.05;

float4 frag(FragmentData input) : SV_Target0
{
	float2 cellIDHash = FastHash2_2(input.uv * 3.489712);
	float2 diskCenter = cos(_Time.yy * cellIDHash * 2.0) * 0.5 + 0.5;
	diskCenter = lerp(DiskRadius, 1.0 - DiskRadius, diskCenter);

	return float4(diskCenter, 0.0, 1.0

This is how this texture looks. In the RG channels, it encodes the position of each particle:

The texture looks like colored noise. Red channel = X position, Green channel = Y position. Each pixel represents one particle's position within its cell.

Then, when rendering the particles, instead of recalculating the position, I just sample the texture:

float2 cellID = floor(uv);
float2 cellUV = uv - floor(uv);

// After the cell UV and ID is calculated, sample the particle position from the texture
float2 diskCenter = _BakedParticlesData.SampleLevel(pointRepeatSampler, cellID * bakedDataTexelSize.xy, 0.0);
float disk = DrawDisk(cellUV, diskCenter, DiskRadius

Results:
The time reduced from 0.35ms to 0.27ms. This is a significant improvement at this point.

And rendering the texture with baked particle positions takes ~0.032us (0.000032ms). It was actually hard to find it in the profiler. When I count the empty space around this performance marker, it would be closer to 0.0004ms.

This is how the final effect looks.

___

Summary and key takeaways

In this article, I showed how to create 3D snow particles using only fragment shaders. Here are the key techniques:

Grid-based particle placement
Multiple raycasted planes to create volumetric depth
Smooth fading to hide plane boundaries

Performance optimizations:

0.27ms per frame (started at 0.60ms). That's a 2.2x speedup.

Limitations:

Particles are displayed on aligned planes, so they work only for a specific camera angles.
They are rendered fullscreen - always.
Fixed particle density.
It is more like a ShaderToy gimmick and nice graphics programming practice than a feature that you would use for production.
In production I would use it for UI particles

The full shader code is available on ShaderToy: https://www.shadertoy.com/view/WcGfDz

Optimized code for Unity: FragmentShaderParticles.unitypackage