Shadows are a real problem.

They’re expensive, must be split into several shadow maps/levels, and they involve multiple passes.

It’s something we take for granted every day – shadows are just kind of *there*. It’s a *part* of lighting. But when we simulate lighting mathematically, the lighting model (regardless of it’s complexity) is simply unable to cover shadow casting within the environment. Shadow mapping has proven to scale well, but it’s not without it’s problems. In this article we will discuss an approximation shadow mapping technique known as Exponential Shadow Mapping – and see the role it plays in modern shadow algorithms.

# Exponential Shadow Maps

## Traditional Shadow Maps

Before we can discussing Exponential Shadow Maps, it’s imperative that you understand traditional shadow mapping. The traditional shadow map is basically the scene rendered from the perspective of the light (as if we could see everything the light touches), and depth information is captured for later calculations.

After we generate a shadow map, we then render the light volume from the point of view of the camera (how we did traditional lighting in my last post, Deferred Rendering Pipeline). The big difference when shadowing is involved – is that we reference the shadow map by translating our point to the light’s perspective-space, and access the depth value stored. Given that we have the depth from the light’s perspective, and the calculated depth from the fragment, we can see whether or not a point is in shadow.

To recap, the main workflow is something like this:

- Render the scene from the light’s perspective.

(For efficiency reasons, you should only render what the light volume actually**touches**). - Render the light from the camera’s perspective.
- For each fragment, move the view position of the pixel into the light’s perspective-space.

(**Light.ViewToLightPersp * viewPos**) - Offset and scale the light perspective point by
**0.5**so that it is in texture lookup coordinates [0, 1].

(**Matrices.PerspToTexture * lightPerspPos**)

*Note: Matrices.PerspToTexture looks like this:*

**Warning:**This matrix is**column-major**, which is the format GLSL expects you to present your matrices in. If you want to construct this matrix**row-major**, you must transpose the matrix. - Query the shadow map for the depth from the light’s perspective.

(**texture(shadowMap, texturePos.xy)**if un-projected,**textureProj(shadowMap, texturePos.xyw)**otherwise) - Compare the queried value to the actual depth value.

(found by taking post-projection light position**z**and dividing it by the near/far divisor**w;**e.g.**lightPerspPos.z / lightPerspPos.w**) - If the
**queriedLightDepth**is less than or equal to the**calculatedLightDepth**, then the current pixel is in shadow.

(Apply no lighting)

- For each fragment, move the view position of the pixel into the light’s perspective-space.

**To learn in more detail about shadow mapping, I recommend the following tutorial:
**

**http://www.opengl-tutorial.org/intermediate-tutorials/tutorial-16-shadow-mapping/**

## Problems With Shadow Maps

In theory, everything above sounds perfect. A little inefficient, but mathematically sound. But when first implemented, you’ll be greeted with a pretty ugly result.

Let’s take the following fragment shader into consideration:

1 2 3 4 5 6 7 8 9 10 11 12 |
// Lighting Equation... // Shadow Effect float visibility = 1.0; vec4 shadowCoord = Light.ViewToShadowBias * vec4(viewPos, 1.0); shadowCoord /= shadowCoord.w; if (texture(shadowMap, shadowCoord.xy).r < shadowCoord.z) { visibility = 0.0; } // Lighting Equation... |

The above artifact is known as *“Shadow Acne”*, the problem is that the resolution of the shadow map isn’t *exactly* enough to satisfy every new point that we’re rendering from our perspective. And so, we get points which are mathematically “in shadow”, though technically they should appear lit. Another way of wording it is that the pixels appear to be ever so slightly **more distant** than the queried depth value. The core problem (at some level) is that our shadow map is of insufficient resolution to satisfy our current projection.

But we can’t just allocate a bigger shadow map, that doesn’t actually fix the problem – it may reduce noise, but it probably won’t reduce it by much. The problem is we would need an infinitely precise shadow map to get the values we actually expect – and that’s not possible.

Let’s hack our way out of this problem. A shadow map is already a simulation of lighting, maybe we can fudge the numbers a little to get more accurate *appearing* results.

For our first hack, we can introduce some **epsilon** which allows our depths to be off a little. The epsilon will have to differ based on the orientation and perspective of the light, since the amount of possible error is different based on the view/light perspective. A general solution to this is to introduce a bias which is scaled by the tangent of the slope. If you consider how the tangent function acts, this basically means the closer the slope is to 1 (meaning perpendicular to the view) the more *exponentially* we scale the bias by. As you can imagine, this is only an approximation for the error, but it tends to work in most cases. You may also want to clamp the bias value between some appropriate min/max range.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
// {Lighting Equation}... // Shadow Effect // Note: Bias must be applied post-projection for comparison float visibility = 1.0; float bias = 0.005 * tan(acos(NoL)); bias = clamp(bias, MinAllowedBias, MaxAllowedBias); vec4 shadowCoord = Light.ViewToShadowBias * vec4(viewPos, 1.0); shadowCoord.z -= bias; shadowCoord /= shadowCoord.w; if (texture(shadowMap, shadowCoord.xy).r < shadowCoord.z) { visibility = 0.0; } // {Lighting Equation}... |

This looks nicer, but there are still problems. First is the presence of an **if statement** in the shader code. If statements are very disrupting to GPUs, and can cause major slowdown for fragments which must branch in execution. One way to combat this is to set an edge in which the value moves from **0 -> 1**. The best function for a hard edge like this is GLSL’s **step** function. Though you may think it does the same thing as your if statement, it’s actually far more efficient about it. You should *always* prefer numerically adjusting values to branching statements.

1 2 3 4 5 6 7 8 9 10 11 12 |
// {Lighting Equation}... // Shadow Effect // Note: Bias must be applied post-projection for comparison float bias = 0.005 * tan(acos(lambertian)); bias = clamp(bias, MinAllowedBias, MaxAllowedBias); vec4 shadowCoord = Light.ViewToShadowBias * vec4(viewPos, 1.0); shadowCoord.z -= bias; shadowCoord /= shadowCoord.w; float visibility = step(shadowCoord.z, texture(shadowMap, shadowCoord.xy).r); // {Lighting Equation}... |

Okay it’s more efficient, but the original problem that introduced the shadow acne still exists in another fashion.

At some level, I can approach the shadow in such a way that I can easily identify the shadow’s edge. This is not how real shadows look since an environment isn’t ever lit by a single point of light, but rather a volume which distributes light around the room. These light areas are difficult to model, so we tend to represent light as if it comes from a single point for a first pass on lighting. If you look at a shadow in the real world, you’ll notice that the shadow has some area where it’s darkest, and then some blurry transition from shadowed to the lit area (known as the penumbra).

Traditional shadow maps present us with the following problem (notice the pixelation):

## Exponential Shadow Mapping

A simple way of tackling these hard-edged shadows is by simply taking many samples, or guessing based on some random variable distribution how much inside/outside of the shadow we are. One of the easiest implementations is Percentage Closer Filtering (PCF), and can be implemented utilizing the hardware by using **sampler2DShadow **uniform types. Another method is to do the sampling yourself using a Poisson distribution. However, Exponential Shadow Maps go an entirely different route with this.

Exponential Shadow Maps store a part of the **approximation** of the shadow casting test, and allows us to recombine it later for the actual approximation.

The whole equation for Exponential Shadow Maps is as follows:

In the above algorithm, **c** is some constant chosen by the user, **z** is the actual rendered depth from the light’s perspective, and **d** is the depth which we are testing against.

The derivation for this is pretty straight forward (and covered in the original paper linked to above) – so I won’t cover it here. However, the key factor is that this new representation of the shadow testing equation allows us to split an important part of the algorithm into the shadow mapping pass. With our previous shadow mapping technique, we could not filter or change the shadow map. The only way we could see where the “edge” of a shadow was, was by taking multiple samples. In this case, the test is split out *partly* into the the rendered shadow map, and since the final approximation is simply scaled by the shadow map value, we have more freedom to filter the values in the shadow map.

You can see how the math for this works out, if we have some values for **z** and **d** such that they approach each other, the product of the two factors approaches **1** (lit). The further apart these two values get, the more the two factors approach **0** (shadowed). For transparent objects, it’s possible to get a value greater than 1 – but that isn’t too much to worry about, we can simply clamp it. However, this convenient distance checking introduces another problem not previously known to shadow mapping – **light bleeding**. That’s what the **c** parameter is for. A larger **c** causes a larger potential difference between the two factors. It doesn’t really eliminate the problem, it just masks it in a semi-appropriate way.

The big problem is that as c becomes larger, the value stored in the shadow map increases, well, exponentially! This means that it’s very easy to run ourselves out of precision, and be stuck with shadow maps which simply don’t work. In your implementation, you may also try to scale the end result in the shadow casting fragment shader so that it hides the light bleeding a bit – I haven’t tested it, but I’d imagine it could do some work without ruining your precision.

The core shadow mapping algorithm remains largely unchanged, except that the value we save from the mapping pass isn’t simply the depth (eg. we need a fragment shader) and the calculation of the shadow caster visibility is different.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
#include <LightBuffer.ubo> #include <Math.glsl> // map_01 // Output Values layout(location = 0) out float expMap; void main() { // Save the post-projection w value [0,1] // Recall: gl_FragCoord.w = 1.0 / postProjPos.w float depthDivisor = (1.0 / gl_FragCoord.w); float mappedDivisor = map_01(depthDivisor, Light.NearPlane, Light.MaxFalloff); // Exponential is a configurable constant for approximation. // Generally a higher Exponential means greater difference in depths. // Because of this there will be less error, but we may run out of precision. expMap = exp(Light.Exponential * mappedDivisor); } |

1 2 3 4 5 6 7 8 9 |
// Lighting Equation ... // Shadow Effect vec4 shadowCoord = Light.ViewToShadowBias * vec4(viewPos, 1.0); float occluder = textureProj(shadowMap, shadowCoord.xyw).r; float reciever = map_01(shadowCoord.w, Light.NearPlane, Light.MaxFalloff); float visibility = saturate(occluder * exp(-Light.Exponential * reciever)); // Lighting Equation ... |

The visibility in the above fragment will be some value [0,1] such that it represents the approximate light unobstructed at this point. This is basically a 1:1 correlation to the exponential shadow map equation listed above. The biggest thing to keep in mind is that the equation actually represents an *approximation* to a shadow. Now we can blur the shadow map to give a soft shadow effect.

## Compute Shader Gaussian Blur

### Update (3/22/15):

I was going to provide some reasoning for the values used in the compute shader, as well as information about when to prefer compute shaders to fragment shaders. But it’s a separate topic, so for now we will simply show that a compute shader is possible. We will later cover when to prefer a compute shader, and what some of the compute shader best practices are. I would like to give it more time than this article would allow.

I haven’t tested whether it would be faster to use a compute shader or a regular fragment shader to apply this blur. Regardless, I figured it was probably a good time to become acquainted with compute shaders. A compute shader is a special kind of shader whose only input is through the form of uniforms, shader storage buffer objects, bound textures, and dispatch sizes. It’s basically a way to program the GPU to do some general computing for you, and hopefully return the result in less time.

The biggest issue with compute shaders is knowing when it’s appropriate. Sometimes the setup time of a compute shader is enough that doing it on your own (on the CPU, or within the fragment shader) would actually be more efficient! So, efficiency testing aside, we’re going to delve into compute shaders 101.

### General GPU Layout

Many programmers are used to thinking of the few cores they have on the CPU. Kernel handles scheduling, so many times we don’t really even need to worry ourselves about which core we run on. Writing code which executes on a GPU is a little different. Let’s look a very simplified memory model for the modern GPU.

Of course, scale this to * hundreds* of threads in each workgroup, and

*of workgroups in each GPU. Essentially, this is what you’re programming for. The closer to the thread that you put the memory, the faster access to the memory will be. The general idea for a GPU is that these workgroups all work together on a problem, and then the answer is put back in some range that no one but that workgroup is allowed to touch.*

**hundreds***Fun fact, this is actually why it’s sometimes called a*

**fragment**shader, because you’re programming for the workgroup (which has a fragment of**pixels**[threads] generating output).So the idea here is that we will separate parts of the picture out to be blurred by each workgroup, and each workgroup will help build the final blurred image. We want to store temporary information in the **shared memory** space so that we can efficiently calculate a result, and then we will commit the information back to the **global memory** space for access. That’s a really quick overview of what a GPU (approximately) looks like. Another important fact is that workgroups* cannot communicate with one another*.

### Local Workgroup Layout

As mentioned, compute shaders don’t actually have any input values. Unlike vertex and fragment shaders which have very obvious in/out keywords, compute shaders don’t allow it for general variables. There is one main input that the user can provide – local workgroup size. This basically says how big the requested workgroup is, or rather – how many threads we want to populate for a particular work item. We can select any dimension of local thread size, the end result of executed threads for that workgroup is all of the sizes multiplied together. The glsl for this workgroup size is a special nameless in attribute.

1 |
layout (local_size_x = X, local_size_y = Y, local_size_z = Z) in; |

If you want only a 1-dimension workgroup, you are able to only provide the **local_size_x** value, in this case the other values are implied to be **1**. If a size greater than the maximum workgroup size allowed by the hardware has been requested, a compiler error will occur.

### Shared Keyword

The shared keyword basically declares some variable that is to be shared within the workgroup. Creating a variable like this is as simple as adding the keyword shared to the beginning of any declaration. The shared memory cannot be initialized to anything though (which thread would get to initialize it?) – so if you want the values to start at some number, you have to have some way of having the threads work together to initialize the values.

1 2 |
shared gtype variableName; // Cannot be initialized! shared gtype arrayName[100]; // Requires an array size! |

As mentioned above, the idea is that shared memory access for the workgroup is faster than global memory access. So we can load information into the shared workgroup space, everyone can reference and write to it, and then everyone sends information back to the global memory space.

### Predefined Variables

Like with fragment and vertex shader, compute shaders define some variables that allow a thread to see information about it’s execution. Where it is in terms of the global execution queue, which unique identifier the local thread has, and the size of the current workgroup.

- uvec3
**gl_WorkGroupSize**- Defines the size requested by the Local Workgroup Layout.
**(local_size_x, local_size_y, local_size_z)**

- Defines the size requested by the Local Workgroup Layout.
- uvec3
**gl_NumWorkGroups**- Defines the values passed into the C call to glDispatchCompute(x, y, z).
**(x, y, z)**

- Defines the values passed into the C call to glDispatchCompute(x, y, z).
- uvec3
**gl_WorkGroupID**- Defines the unique identifier of the workgroup based on the requested executions by C call to glDispatchCompute().
- Within the range
**(0, 0, 0) -> (gl_NumWorkGroups.x – 1, gl_NumWorkGroups.y – 1, gl_NumWorkGroups.z)**

- uvec3
**gl_LocalInvocationID**- Identifies the current executing thread’s unique identifier for this workgroup.
- Within the range
**(0, 0, 0) -> (gl_WorkGroupSize.x – 1, gl_WorkGroupSize.y – 1, gl_WorkGroupSize.z – 1)**

- uvec
**gl_GlobalInvocationID**- Identifies the global unique identifier for the current thread based on a combination of the above variables.
- Can be recalculated by
**gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID**

### Final Compute Shader

Though I haven’t covered all of the intricacies, the rest of the information can pretty easily be found in comments and code provided. This is my final blur compute shader.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
/******************************************************************************* * compute/gaussianBlur.comp *------------------------------------------------------------------------------ * Apply the blur over some weights in a direction provided through uniforms. ******************************************************************************/ // Thread group size 128x1x1 layout (local_size_x = 128) in; // Blur Information uniform BlurData { int Width; // w int Width2; // 2w float Weights[65]; // Weight[2w + 1] = { ... } } Blur; uniform ivec2 Direction; // Inputs / Outputs layout (r32f) uniform readonly image2D src; layout (r32f) uniform writeonly image2D dst; // Shared Workspace (Max w = 32) shared float v[128 + 64]; void main() { // Note: // WorkGroupSize = (local_size_x,local_size_y,local_size_z) // ^- layout declared at top of compute shader. // WorkGroupId = [(0,0,0), (num_groups_x,num_groups_y,num_groups_z)] // ^- Parameters passed in from glDispatchCompute(). // LocalInvocation = [(0,0,0), (local_size_x-1,local_size_y-1,local_size_z-1] // ^- Essentially the current executing core. // // GlobalInvocation = GroupId * GroupSize + LocalInvocation ivec2 currTexel = ivec2(gl_GlobalInvocationID.x * Direction + gl_GlobalInvocationID.y * (1 - Direction)); uint texelIndex = gl_LocalInvocationID.x; int workWidth = int(gl_WorkGroupSize.x); // Load image information into temporary workspace ivec2 sourceTexel = currTexel - Blur.Width * Direction; v[texelIndex] = imageLoad(src, sourceTexel); // First 2w threads will load the last 2w texels. if (texelIndex < Blur.Width2) { v[texelIndex + workWidth] = imageLoad(src, sourceTexel + workWidth * Direction); } // We must wait until all information is loaded into the shared work group. // This waits until all local threads get here, and then they can all continue. barrier(); // Calculate blurred results for each pixel float result = 0.0; for (int j = 0; j <= Blur.Width2; ++j) { result += v[texelIndex + j] * Blur.Weights[j]; } imageStore(dst, currTexel, vec4(result)); } |

The Gaussian Blur weights provided can be generated by using the Normal Distribution equation, with some given input parameter for the standard deviation. I’m fairly new to random distributions, in fact I just started learning about them this year – so I’m unsure if I’m implementing them in the best way possible. What I opted to do was simply create a class which will generate the distribution data, and be able to be written to a Uniform Buffer object for usage within the compute shader. I’ve also opted to limit the blur to a width of 32 (the compute shader requires some constant size for shared memory declarations).

I’ve simply coded up the Normal Distribution function, and allow a variable for sample **position**, **mean**, and **standard deviation** to be passed into it. Values at certain points are then calculated, and in the end we divide each value in our normal distribution by the sum of the values (normalizing it so that the sum is 1). In general, we don’t want to offset the normal distribution, so my “BlurDataBuilder” really only accepts **width**, and **standard deviation**, passing 0s into the normal distribution function for the mean. How I generate such weights is as listed:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
OpenGLBlurData::OpenGLBlurData(int width, float deviation) : m_blurWidth(width) { float total = 0.0f, current; // Make sure the BlurData is within the valid range. if (m_blurWidth < 1) m_blurWidth = 1; if (m_blurWidth > 32) m_blurWidth = 32; m_blurWidth2 = 2 * m_blurWidth; // Calculate the original normal distribution. for (int i = 0; i < m_blurWidth; ++i) { current = Karma::normalDist(float(m_blurWidth - i), 0.0f, deviation); m_weights[i] = m_weights[m_blurWidth2 - i] = current; total += 2.0f * current; } m_weights[m_blurWidth] = Karma::normalDist(0.0f, 0.0f, deviation); total += m_weights[m_blurWidth]; // Normalize the values so that they sum to 1 for (int i = 0; i <= m_blurWidth2; ++i) { m_weights[i] /= total; } } |

## Putting it All Together

The idea is that you do two passes which will blur the screen both horizontally and vertically, producing a Gaussian blurred image. I’ve noticed that in general I want my width to match my standard deviation, or else the weights provide little-to-no difference past the standard deviation’s “bounds”. Below are a few pictures showing both the exponential shadow map (in order to display it we take the log of the value and divide by the Exponential). One picture shows an unfiltered map (e.g. no blur), the other picture shows a filtered map with a width of 8 and a standard deviation of 8.

## Final Renders

Exponential Shadow Maps are a pretty simple addition to any existing graphics framework. It just requires a little understanding of the underlying topics, and what the variables in the Exponential Shadow Mapping equation stand for. The hardest part of any shadow mapping algorithm is setting the framework up for the multiple passes with different render properties. After that it’s just implementing variations of different shadow mapping techniques. Though I dislike the approximate value that the Exponential Shadow Maps provide, I understand that it’s extremely convenient to be able to soften shadows in such a trivial way – so I will likely deal with any shortcomings of the algorithm (too small to really care about).

**— And the below pictures are properly gamma corrected to allow for multiple lights —**