Home

Blog

Bake AO

About

Contact

Home

Blog

Bake AO

About

Contact

Blog

Rendering

Shell texturing vs raymarching

Sep 3, 2025

10 minutes

Shell texturing is a rendering method that renders multiple layers of a surface with an offset to add depth or nuance extending from the surface. I will explain what is the main problem with shell texturing and what are the options to optimize it.

Examples

Some games use shell texturing to render effects like fog, fur, grass, energy shields, or similar visuals.

:image-description:

Biomutant used shell texturing to render fur. Porky Puff has visible shell texturing artifacts, where you can see various shell layers. If the game is good, who cares about the artifacts?

:image-description:

Kerbal Space Program used shell texturing for the reentry heating effect. The game renders the spaceship, then renders it multiple times as distorted shell layers to draw the flames.

How it works

Shell rendering usually works by rendering the object multiple times, each time offsetting vertices slightly along the normals. This makes the object look like it has a surrounding shell with an effect.

I implemented volumetric fog using shell texturing to show what is possible with this rendering technique.

:center-px:

:image-description:

Using 5 shell layers to draw volumetric fog. You can see 5 quads are drawn.

:center-px:

:image-description:

Using 10 shell layers to draw volumetric fog. You can see 10 quads are drawn.

:center-px:

:image-description:

Using 40 shell layers to draw volumetric fog. You can see 40 quads are drawn. It got dense.

The effect can be achieved in multiple ways. Here are some of my ideas:

1. Prepare a separate mesh. Duplicate the mesh vertices as many times as many layers you want to render. Add the additional "shell offset" attribute to each vertex and make it different for each layer.

2. Render single mesh many times, each time with different shader properties.

3. Render a mesh using instancing. Render as many instances as many layers you want to have. Use SV_Instance attribute in your vertex shader as the shell ID.

4. Use geometry shader to render an original mesh. For each rendered triangle in a mesh spawn additional "shell" triangles.

___

Shell texturing performance

Shell texturing is a memory-intensive rendering method. For each layer, the GPU computes a color and blends it with the color buffer. Each pixel of each layer uses memory bandwidth to read the color buffer, blend the color, and write the value back.

With shell texturing, drawing more layers usually results in better image quality but lower performance. Let's measure its performance on RTX 3060.

Let's look at this scene:

:image-description:

Scene with animated volumetric fog trails that use 20 additive transparent layers.

When profiled with Nvidia Nsight Graphics GPU Trace Profiler on RTX 3060 at Full HD, this fog took 1.12ms to render on average (measured over 10 frames). It is bottlenecked by the Screen Pipe, which handles blending colors into the color buffer.

:center-px:

:image-description:

Profiling results of shell-textured volumetric fog. Screen Pipe is the main bottleneck because each layer needs to blend color into the color buffer.

I expect this rendering technique will be slower on GPUs with lower memory bandwidth. How can I optimize it?

How can it be optimized?

Let's think about how shell texturing can be optimized. You can always think about:

Reducing the layer count.
Reducing the size of a rendered object on the screen. Shell texturing performance depends on the number of covered pixels. For cinematics, it may help to change some camera angles.

Those are basic ideas. Let's consider some edge cases that are common in real applications.

Rendering shells without any offset

In some cases, I want to render the same object with transparency many times to increase the effect intensity. Imagine a nice energy shield effect. It is tempting to duplicate the hemisphere, create a new material with different parameters, and render it 2-4 times to bump the effect.

However, duplicating transparent objects will double the rendered pixels and use VRAM bandwidth. If the object is large, it will slow down the rendering, especially on low-end devices.

How to optimize it?

Don't render the effect multiple times. Store the effect parameters in a buffer and use a for loop inside the shader to encapsulate all the logic inside a single fragment shader.

Single shader execution will handle all the layers, but it will write to the screen only once!

Here is some pseudo code that illustrates the idea.

float4 color = 0.0f;
// Iterate through each layer
for (int i = 0; i < _LayersCount; i++)
{
   // For each layer, fetch it's parameter and accumulate the color.
   LayerParameters parameters = _LayerParameters[i];
   color += GetLayerColor(float2 uv, parameters);
}

:image-description:

Example layered effect. The same effect is stacked multiple times with different parameters. Check this shader in action here: https://www.shadertoy.com/view/3cByRd

Layers can be stacked in the shader code.

// First layer  
vec3 noise = Noise3D(RotateAroundY(effectPos, rotationTime * 3.0) + time).rgb * mask;  
col.rgb += pow(smoothstep(0.2, 0.8, noise.r) * vec3(1.0, 0.6, 0.3), vec3(2.0));

// Second layer  
noise = Noise3D(RotateAroundY(effectPos * 2.0, rotationTime * 1.0) - time).rgb * mask;  
col.rgb += pow(smoothstep(0.4, 0.8, noise.r) * vec3(1.2, 1.0, 0.3), vec3(2.0)) * 0.5;

// Third layer  
noise = Noise3D(RotateAroundY(effectPos * 5.0, rotationTime * 2.0) + time).rgb * mask;  
col.rgb += pow(smoothstep(0.3, 0.8, noise.r) * vec3(1.2, 0.6, 1.4), vec3(2.0)) * 0.2;

// Third layer  
noise = Noise3D(RotateAroundY(effectPos * 7.0, rotationTime * 4.0) + time).rgb * mask;  
col.rgb += pow(smoothstep(0.3, 0.8, noise.r) * vec3(1.2, 1.2, 1.4), vec3(2.0)) * 0.1;

:image-description:

This example uses a lot of magic numbers. You usually want to expose those as parameters. I created this shader solely for this article to explain the idea.

In-shader raytracing

In many scenarios where shell texturing is used on a plane, sphere, or terrain, it is possible to implement it using ray marching. It works like shell texturing, but all logic for rendering multiple offset layers is contained within a single fragment shader. As in the above example, the shader also handles the layer's geometry.

:image-description:

Replacing shell texturing with raymarching is not always possible. When it is, you reduce the cost of blending colors but increase the cost of raytracing layers in the fragment code. From my experience, marching or raytracing in the shader is more efficient than shell texturing.

This is what we need to do in a fragment shader code to make it work in the same way:

Handle the geometry of each layer (ex., using raycasting or raymarching)
Depth testing with the opaque scene
Accumulating the colors from multiple layers.

Original shell texturing shader

Let's look at the original shader that assumes that all the shell layers are included within a rendered mesh.

FragmentData vert(VertexData input)  
{  
   FragmentData output;  
     
   // In this case, all the layers are in a mesh. 
   // There is a vertex attribute containing a shell offset.  
   // Here I use this shell offset to move vertices along their normals in object space.  
   float3 positionOS = input.positionOS.xyz + input.normalOS.xyz * input.shellOffset * _ShellOffset;  
     
   // Compute clip pos and forward the rest of the vertex parameters into the fragment shader.  
   output.positionCS_SV = TransformObjectToHClip(positionOS);  
   output.uv = input.uv;  
   output.shellOffset = input.shellOffset; // Vertices contain shell offsets.
   return output;  
}
float GetFogValue(float2 sampleUV, float offset)  
{  
// Compute some noise value here  
   float2 uv = sampleUV;  
   float time = _Time.y * 0.15 * 0.0 - 0.1;
   uv.y += sin(uv.x * 18.0 + time * 5.4) * 0.05;
   float noiseValue = _FogTexture.SampleLevel(linearRepeatSampler, uv + time * 0.1, 0.0).r;  
   noiseValue *= smoothstep(0.25, 0.05, dot(sampleUV - 0.5, sampleUV - 0.5));  
   noiseValue = smoothstep(offset, offset + 0.15, noiseValue);  
   noiseValue *= 0.2;
   return noiseValue;  
}
float4 frag(FragmentData input) : SV_Target  
{  
// Get the noise and output to the screen.  
   return float4((float3)GetFogValue(input.uv, input.shellOffset) * _Color.rgb, 0.0);  
}

Look at the fragment shader. It is super simple. It only computes the noise and outputs a color to the screen. No complex logic there. The whole shell logic happens in the vertex shader. GPU renders multiple layers and does depth testing and color blending for us.

This is the effect:

I calculate the fog in object space. This allows me to adjust the fog position and compose it using objects in the hierarchy.

Optimized raymarching shader

Now, let's look at the optimized raymarched shader. First of all, I used a cube mesh, so I needed to adjust it a little to match the bounding box of all the shells. This is a vertex shader:

struct VertexData  
{  
  float3 positionOS : POSITION;  
};

struct FragmentData  
{  
  float4 positionCS_SV : SV_POSITION;  
  float3 positionWS : TEXCOORD0;  
  float4 positionCS : TEXCOORD1;  
};

FragmentData vert(VertexData input)  
{  
  FragmentData output;
  
  // I used built-in cube mesh, so I need to adjust the scale a little  
  float3 positionOS = input.positionOS.xyz;  
  positionOS.y = positionOS.y * 0.5 + 0.25;  
  positionOS.y *= _ShellOffset * 2.0;
  
  float3 positionWS = TransformObjectToWorld(positionOS.xyz);  
  output.positionCS_SV = TransformWorldToHClip(positionWS);  
  output.positionCS = output.positionCS_SV; // Passing the clip space position, it will be used to sample depth.  
  output.positionWS = positionWS;  
  
  return output;  
}

Notice that I do not offset the vertices along the normals. The shader just draws the interior faces of a cube.

The key work happens in the fragment shader, which handles depth testing, raycasting layers, and accumulating the color.

The fragment shader code is explained using the comments:

float RaycastPlane(float3 rayOrigin, float3 rayDirection, float3 planeNormal, float planeOffset)  
{  
  // Utility function for raycasting the plane.  
  float denom = dot(planeNormal, rayDirection);  
  float t = -(dot(planeNormal, rayOrigin) + planeOffset) / denom;  
  return t;  
}

float GetFogValue(float2 sampleUV, float offset)  
{  
  /// The same code as in the previous shader.  
  ...  
}

float4 frag(FragmentData input) : SV_Target  
{  
  // Calculate screen UV - required to sample depth.  
  float2 screenUV = input.positionCS.xy / input.positionCS.w;  
  screenUV = screenUV * 0.5 + 0.5;  
  #if UNITY_UV_STARTS_AT_TOP  
   screenUV.y = 1.0 - screenUV.y;  
  #endif
  
  // Sample scene depth  
  float sceneDepth = SampleSceneDepth(screenUV);
  
  // Convert scene depth to scene position in object space.  
  float4 scenePositionCS = input.positionCS / input.positionCS.w;  
  scenePositionCS.z = sceneDepth;  
  float4 scenePositionWS = mul(UNITY_MATRIX_I_VP, scenePositionCS);  
  scenePositionWS /= scenePositionWS.w;  
  float3 scenePositionOS = TransformWorldToObject(scenePositionWS.xyz);
  
  // Calculate ray origin and ray direction in object space.  
  float3 rayOriginWS = _WorldSpaceCameraPos.xyz;  
  float3 rayDirectionWS = normalize(input.positionWS.xyz - rayOriginWS);  
  float3 rayOriginOS = TransformWorldToObject(rayOriginWS);  
  float3 rayDirectionOS = TransformWorldToObjectDir(rayDirectionWS);
  
  // Calculate distance from camera to the scene depth. This will be used for depth testing  
  // rayOriginOS is a camera position in object space.  
  float sqDistanceToSceneOS = dot(scenePositionOS - rayOriginOS, scenePositionOS - rayOriginOS);  
    
  // Calculate distance between each layer;  
  float secondOffset = 1.0 / (_LayerCount - 1.0);
  
  // Raycast first shell layer and second shell layer.  
  float firstTPlane = RaycastPlane(rayOriginOS, rayDirectionOS, float3(0.0, 1.0, 0.0), 0.0);  
  float secondTPlane = RaycastPlane(rayOriginOS, rayDirectionOS, float3(0.0, 1.0, 0.0), -secondOffset * _ShellOffset);
  
  // Distance between those two hits (hitOSDelta) will be used in a loop to speed up hit calculation.  
  float tPlaneDelta = secondTPlane - firstTPlane;  
  float3 firstHitOS = rayOriginOS + rayDirectionOS * firstTPlane;  
  float3 secondHitOS = rayOriginOS + rayDirectionOS * secondTPlane;  
  float3 hitOSDelta = secondHitOS - firstHitOS;
  
  // Accumulate the fog value here  
  float fogValue = 0.0;
  
  // Iterate through all shell layers here  
  [unroll(30)]  
  for (float i = 0; i < _LayerCount; i++)  
  {  
     // PositionOS of this layer  
     float3 planeHitOS = firstHitOS + hitOSDelta * i;
    
     // Depth testing - compare distance to the plane with distance to the scene.  
     float sqDistanceToPlaneHitOS = dot(planeHitOS - rayOriginOS, planeHitOS - rayOriginOS);  
     float depthTestMask = step(sqDistanceToPlaneHitOS, sqDistanceToSceneOS);
    
     // Shading the layer  
     float layerValue = GetFogValue(planeHitOS.xz + 0.5, secondOffset * i);
    
     // Accumulate the color  
     fogValue += layerValue * depthTestMask;  
  }
  
  // And blend into the screen once.  
  return float4((float3)fogValue * _Color.rgb, 0.0);  
}

This code is much more complex than usual shell texturing.

Let's compare the performance of shell texturing versus ray marching. The same scene, the same camera angle.

:image-description:

Measured on RTX 3060.

The raymarching shader rendered more than twice as fast. The bottleneck shifted from Screen Pipe (color blending) to SM units (shader code execution), providing almost the same visuals. The color was slightly different because I did not tweak it accurately.

:image-description:

Both methods yield the same visuals, but raymarching renders more than twice as fast.

Summary

Shell texturing is a shading technique that renders multiple offset layers of a surface to create depth and nuanced visual effects like fur, fog, grass, or energy shields.

When to use shell texturing:

For effects that need visible, discrete layers.
When you need a quick prototype or want to leverage GPU hardware blending instead of writing complex shaders.
On target devices where overdraw and memory bandwidth aren't a major bottleneck.

When to avoid shell texturing:

Volumetric effects (fog, fire, smoke, plasma), where continuous density matters - raymarching is faster and smoother.
On fill-rate-limited devices (low-end consoles, mobile, older GPUs)
For on-screen large effects - better to consolidate logic into a single shader pass. Avoid complex shell texturing effects being displayed on the whole screen.

Trade-offs:

More layers = higher visual fidelity, but also heavy bandwidth usage and overdraw.
Raymarching and in-shader accumulation give similar results at much lower cost - shifting the bottleneck to shader ALU instead of memory bandwidth.
Shell texturing is simple to implement and can be implemented using various approaches, but it doesn't fix the fill-rate problem.