Previously we updated our BRDF to use Physically Based Shading (PBS) and gave the ambient term an overhaul with Importance Sampled Image Based Lighting (IBL). These changes alone greatly assisted in making our real-time renderer near-realistic (relative to real-time rendering). The advanced ambient term has one issue though – the ambient contribution is independent of the geometry and how much of the environment is actually visible. For a more realistic render, we must approximate the amount of the environment that each fragment sees. The way we do this is through a process known as Ambient Occlusion.
Ambient Occlusion
Ambient Occlusion is the process of, well – occluding the ambient term. As simple as it sounds, remember that our ambient term is an approximation of it’s own. We’re not actually raycasting, so we cannot rely on the implied probability of light making it’s way into cracks and crevasses for our occlusion term. That being said, we do have enough information to give a pretty good estimate for the occlusion term for each fragment.
Slide to compare SSAO Disabled (Left) and SSAO Enabled (Right). |
We usually want to work with our algorithms in screen space to reduce the likelihood of precision errors if we happen to be far away from the origin. So whenever you see “screen-space”, you should be pretty interested already. For Ambient Occlusion, this is no different, and as such we will prefer Screen Space Ambient Occlusion to alternatives.
Alchemy Ambient Occlusion
I looked at a few different kinds of ambient occlusion, but the results from the Alchemy paper stood out as the best to me. Alchemy defines the obscurance estimation as follows:
Where…
is the vector from the fragment position to the sample position.
is the fragment position’s normal.
is the depth in camera space of the fragment.
is some threshold modifier for fixing depth-based precision issues and light bleeding.
is the shadow scalar (typically for increasing the power of the occlusion).
is a power modifier for contrast control of the shadow.
is some small epsilon for preventing divide by 0. (Typically 0.0001)
That all sounds pretty complicated, but if you think about what it means geometrically, it’s fairly simple. We sample texels around the current fragment – for each texel, we will dot their normal with the vector from the original fragment to the current texel. This gives us an intensity of occlusion (the partial magnitude of ), as a percentage by being divided by the actual magnitude of . You could imagine that a very powerful obscruance factor could be introduced by a vector which is the maximum sampled distance away from our fragment, and the resulting is along the fragment’s normal (by being overhanging geometry – for example, a fragment underneath a car samples the bottom of the car itself).
The depth modifier allows us to apply an additive offset to handle cases where the distance introduces error with our normal and vector . In these cases, is used to scale the shadow when we are within a corner or crack. The scalar changes the intensity of the shadowed parts by being multiplied into the final summation value. I’ve found that some modification of is necessary for crisper shadows – which controls the overall contrast between the shadowed and the non-shadowed areas. This is very similar to the contrast term in tone mapping algorithms, which is applied through a power modification as well.
The algorithm above was derived by an earlier algorithm
Which contains the heaveside function (GLSL’s step function) for ignoring points outside of the radius . I’ve found that the control over the shadow’s intensity via was too important to remove, and dividing by the potentially larger depending on the length of made it unusable as a shadow scalar, and I found myself constantly modifying for added contrast. This was not ideal, so I stuck with the removal of from the denominator, but I did not remove the heaveside function (geometrically, I could not make sense of their decision to remove it). My reasoning for this is because I don’t wish to take into account the points which lie far outside of the allowable ambient occlusion radius. These modifications bring us to the following function. (I’ve hard-coded as the recommended 0.0001, as it’s only purpose is to prevent division by 0 – though I imagine it doesn’t need to be hard-coded and could be configurable.)
Note: I attempted to do smooth blending with an and which would GLSL smoothstep between their respective values. But I found that I often set = , thus removing the need for the smoothstep. This makes sense that only the Heaveside would be needed, because the amount of occlusion is already taken into account in in the numerator. It’s not the same physical property (it doesn’t represent separation allowing light to bleed in, it more represents an overhang) but I accepted the overhang-only approach, as the shadows looked more reasonable with lesser values.
The other change I made was an alteration suggested by the paper Scalable Ambient Occlusion – to apply a random rotation based on an XOR hash which utilizes the x and y coordinate.
1 2 3 4 5 6 7 |
// Generate a random angle based on the XY XOR hash float randAngle() { uint x = uint(gl_FragCoord.x); uint y = uint(gl_FragCoord.y); return (30u * x ^ y + 10u * x * y); } |
Which is utilized in the random disk-selection of our point. This applies a “predictably random” rotation of the initial sample point selection. This is done so that pixels near each other with similar geometry around them don’t select the same thing multiple times – which unfiltered may be identifiable as “repeats” in the shadow-sampled geometry. I mentioned this briefly in my previous blog post, Physically Based Shading and Importance Sampled Image Based Lighting, but I did not show an example of the artifacts which it produces. Below you can see an example of if I had sampled without applying some kind of random rotation to each fragment’s sample points.
Example of the Error
(See below for what proper sampling looks like)
I also use a arithmetic “if” (GLSL’s step function) for making sure the current sample is within the visible region. The reason for this is to avoid the edges bleeding into the SSAO buffer as “shadowed” area, which is incorrect. You can see these artifacts for yourself by simply omitting the EdgeError variable in your calculations (below), and having a purely smooth surface which goes off one of the edges.
Putting this together, my final ambient occlusion shader looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
/******************************************************************************* * lighting/ambientOcclusion.frag *------------------------------------------------------------------------------ * Calculates the ambient occlusion term to be multiplied in during the * environment pass. Also could mask the pass at a later time if needed. ******************************************************************************/ #include <GBuffer.ubo> // viewPosition, normal #include <Math.glsl> // Hammersley, randAngle #include <GlobalBuffer.ubo> // Current.Dimensions // Occlusion Output layout(location = 0) out highp float fFragColor; // Note: Keep in mind that the proper values for the ambient occlusion // shader should change depending on the units in which your game world // works within. It may require dynamic modification of these terms. uniform float SampleRadius = 1.0; uniform float ShadowScalar = 1.3; uniform float DepthThreshold = 0.0025; uniform float ShadowContrast = 0.5; uniform int NumSamples = 20; void main() { float visibility = 0.0; vec3 P = viewPosition(); vec3 N = normal(); float PerspectiveRadius = (SampleRadius / P.z); // Main sample loop, this is where we will preform our random // sampling and estimate the ambient occlusion for the current fragment. for (int i = 0; i < NumSamples; ++i) { // Generate Sample Position vec2 E = Hammersley(i, NumSamples) * vec2(pi, pi2); E.y += randAngle(); // Apply random angle rotation vec2 sE= vec2(cos(E.y), sin(E.y)) * PerspectiveRadius * cos(E.x); vec2 Sample = gl_FragCoord.xy / Current.Dimensions + sE; // Create Alchemy helper variables vec3 Pi = viewPosition(Sample); vec3 V = Pi - P; float sqrLen = dot(V, V); float Heaveside = step(sqrt(sqrLen), SampleRadius); float dD = DepthThreshold * P.z; // For arithmetically removing edge-bleeding error // introduced by clamping the ambient occlusion map. float EdgeError = step(0.0, Sample.x) * step(0.0, 1.0 - Sample.x) * step(0.0, Sample.y) * step(0.0, 1.0 - Sample.y); // Summation of Obscurance Factor visibility += (max(0.0, dot(N, V) + dD) * Heaveside * EdgeError) / (sqrLen + 0.0001); } // Final scalar multiplications for averaging and intensifying shadows visibility *= (2 * ShadowScalar) / NumSamples; visibility = max(0.0, 1.0 - pow(visibility, ShadowContrast)); fFragColor = visibility; } |
Bilateral Blur
Much like with IBL, we notice artifacting and spottyness when we take few samples. In this case we must blur the sampled image in order to hide the under-sampling. This can’t technically be done with a two-pass compute shader, but we will ignore that and do it anyway, because for the most part any error contributed from the lack of resolution is hard to notice.
Slide to compare Unfiltered (Left) and Filtered (Right). |
The algorithm for blurring an image while preserving hard cuts and edges is very similar to our original Gaussian blur from the Exponential Shadow Mapping article. The only difference is that we will now weight the blur contribution (inversely) of each texel by the difference in view depth between the blurring point, and each contributing point within the blur. Much like the previous blurring algorithm, we will also do this two-pass (vertical and horizontal). Recall that this is technically incorrect – but the results are generally indistinguishable from a gameplay perspective – it’s best to accept the error this produces and proceed with the simplest method.
The algorithm looks something like this:
Where…
Effectively, the only change past our previous blur compute shader is to compute the range modifier function . represents the standard deviation from our normal distribution, while is the variance of the distribution. Our values , , , , come from either the current fragment’s depth/normal, or the comparison point’s depth/normal (all of these values in view-space). To help make things a little faster, I loaded this data into shared memory (the view depth, and the view normals), that way after everything was loaded, the workgroup could simply number crunch and return the result.
Note: It was mentioned to me that there is an error with the compute shader from the Exponential Shadow Map tutorial – bounds were not being checked properly. For Spotlight Exponential Shadow Maps, these errors were generally introduced within the range which is shadowed entirely by the spotlight-factor (so I didn’t bother fixing them). Though generally this can prove to be troublesome for other types of lights. For blurring the ambient occlusion, this is definitely a problem – we see and use all of the ambient occlusion map. So another change is that you must check your bounds and do whatever you can to reduce banding from blurring the edges.
Again, I unfortunately have not compared the compute shader’s performance with a fragment shader. I simply want to become comfortable with compute shaders – at a later date I will hopefully get around to comparing the performance of a good compute shader vs. a good fragment shader for filtering. I’d like to give it more thought than the time/space that these posts will currently allow. (Most likely, I’ll get into identifying more bottlenecks when I get around to a major refactor of the famework I’ve developed.)
Final Images
And with just one fullscreen quad, 20 samples/fragment, clocking in at 9.6 million samples for a traditional 800×600 application – Screen Space Ambient Occlusion is born! This generally requires less filtering and sampling than IBL (unless you’ve pre-computed your IBL, in which case SSAO is definitely the hog here). The results are beautiful, and it really helps to put the object within the environment. For the following set of screenshots, I determined the best way to sample the effect would be with Disabled/Enabled sliders, and I have the SSAO turned way up in terms of shadow intensity and contrast. However, the computation is still the same, and they’re all still 20 samples – they just looks pretty darn good like that.
Waoh, looks nice !
Thank you – Glad you liked ’em!
Trent, wonderful article! Really solid work, and expertly explained. But I think it might be the case that the sliders are backwards, at least in Google Chrome? I seem to be getting the SSAO enabled images (and filtered dragon) when the slider is to the left.
Oh odd, I don’t seem to be having this issue. Thanks for bringing it to my attention, though. Would you mind sending a screenshot of the issue to treed0803@gmail.com so I can confirm it? Everything seems solid on my end (Chrome).
Great article! Just a minor comment that in the initial formula, the power term, “k”, is missing (it looks like it’s been cropped out of the image).
Hey! I know that it’s been quite some time since you wrote the code. I have a few questions. What is the viewPosition()? You seem to get that function from a buffer object and it can take Samples as an input? Are the Current.Dimensions just the window’s X and Y dimensions? What are pi and pi2 in line 36? And lastly: Is there a workaround if I don’t have a normal buffer? I was trying to get this working in Vizard, which OpenGL implementation is kind of outdated.
Hey Olaf,
viewPosition() returns the position in 3D space of the current executing fragment in view-space. This can be calculated or reverse engineered with some clever math using only the current executing uv-offset for the fullscreen quad and the depth value (howeveer you store it), and the Perspective-To-View matrix which will move us in reverse in the graphics pipeline from perspective-space to view-space. Or you can just have the view-space-position of the fragment ready in some manner.
Yup, Current.Dimension is used to normalize gl_fragCoord, since the values returned are [0, Dimension], actually – there might be a tiny bit of error here, because depending on how gl_fragCoord is defined, I think it might be [0.5, Dimension + 0.5] unless you redeclare it with certain specifiers, so I’m not completely sure it’s 100% accurate – might want to investigate. (https://www.opengl.org/sdk/docs/man/html/gl_FragCoord.xhtml)
pi is PI, as in 3.14159… etc. etc. – You’d think pi2 would be pi-halves, but it’s not – looks like it’s just good-old 2*pi (https://github.com/TReed0803/QtOpenGL/blob/e12eb713759c296a6791df247cc8bf92970423c7/resources/shaders/Math.glsl) Not sure what I was thinking with the naming scheme there.
I’m not sure what Vizard is, but I can’t imagine there would be an old enough OpenGL implementation to not allow you to store data in a texture representing the normals (that’s all I do here, nothing fancy). If you can store buffers of stuff, see if there’s a way to generate a buffer of normals, and attach that as input – otherwise I’m not sure what to say.
Sorry for the late reply, by the way – Hope this information helps, though!