gazebo-classic icon indicating copy to clipboard operation
gazebo-classic copied to clipboard

Shadows at low light angle

Open iche033 opened this issue 1 year ago • 7 comments

Reported by external contributor:

"At low sun angles, there are sometimes these areas in the scene that are not shadowed. With default Gazebo settings the problem is subtle and also hidden from view. To make these unshadowed regions apparent you need to change Line 62 of media/rtshaderlib150/SGXLib_IntegratedPSSM.glsl to return 0.0; This makes it so that areas in front of or behind the shadow map depth range will show as dark on the screen. You can try it with the shadow_coverage.world.

The problem will appear subtle with only small slivers of darkness at the edges of the screen sometimes. The problem gets really bad like you see in the attached screenshot when the split distance lambda is changed from 0.75 to 0.95 (you can change Line 75 of gazebo/rendering/RTShaderSystemPrivate.hh). This is something we want to do to make the shadows much more crisp.

Screenshot_20220823_122036

Some info that make help with debugging:

iche033 avatar Sep 13 '22 23:09 iche033

Mmmm... there were several subtle bugs in the PSSM implementation (both shader and camera setup) that caused these bugs.

I was so happy these bugs were behind me and now I'm being dragged back into it :rofl:

IIRC one quick hack was to increase the caster AABB. I remember back in the days creating an empty (or near invisible) MovableObject with an AABB that is big enough to contain what I wanted to be in the shadows e.g. if the camera is at [15, 26, 47] and the shadow distance was set to 500m, then I'd create an AABB of half size 500m (usually of smaller height to increase quality, e.g. [500, 500, 50] ) and center it around the camera at [15, 26, 47]

This does decrease quality as it forces the PSSM to consider a large caster. However most of the time the "increased" quality was caused by buggy math (i.e. very crisp shadows are useless if there are glitches).

I could take a deeper look at the code to see what's wrong, but I could use a link on how to get gazebo-classic compiled & running.

darksylinc avatar Sep 16 '22 00:09 darksylinc

here are the installation instructions: https://github.com/osrf/gazebo_tutorials/blob/master/install_from_source/tutorial_default.md

Instead of removing all ignition-* libraries as suggested in the instructions, I just put the ignition dependencies in my colcon workspace and build them from source along with gazebo11:

  • ign-cmake2
  • ign-common3
  • ign-fuel-tools4
  • ign-math6
  • ign-msgs5
  • ign-tools1
  • ign-transport8
  • sdf9

After everything is built, run the test world:

gazebo --verbose shadow_coverage.world

The gazebo cmd in turn launches two executables, gzserver and gzclient. You can run them separately if needed.

iche033 avatar Sep 16 '22 00:09 iche033

I was able to compile & run gazebo classic.

I backported PSSM & FocusedShadowCameraSetup implementations from OgreNext into Gazebo. Although it compiles and runs; I have no idea why no shadows appear. I could be missing something small or big.

Unfortunately without being able to use RenderDoc (the OpenGL version being used is too old) I am flying blind and I am very limited on diagnosing the problem. I couldn't get the backport to work.

However with this exercise I was able to remember what were the problems with the original implementation:

  1. Shadow Camera never considers caster AABB. Ogre considers the following AABBs:
    • AABB of visible receivers (_sm->getVisibleObjectsBoundsInfo(_cam).receiverAabb, note it says _cam)
    • AABB of visible casters (_sm->getVisibleObjectsBoundsInfo(_texCam).aabb, note it says _textCam) - This is wrong. Visible casters aren't the only thing that matter. If there is a giant pole 10km away casting a shadow on me, that pole can't be ignored. All casters must be considered, not just visible ones. - Most of the time this is not a problem because visible casters and all-casters coincide, or it is at least big enough to not matter. However it fails spectacularly in empty scenes like the one you are providing because some splits literally end up with nothing on the shadow camera (no receivers, no casters in that split)
  2. PointListBody might be buggy. I never got to the bottom of it, and just replaced its use with more straightforward math; limiting how much we use it in OgreNext and giving it simple input (i.e. convex objects).

This is why adding an empty object with a large caster AABB workarounded the problem; because that caster is always visible in all splits.

I could dedicate more time to this; but I'd have to try to compile a newer Ogre 1.x version from source and see if we can get OpenGL 3.x rendersystem running so that it can run on RenderDoc.

darksylinc avatar Sep 17 '22 23:09 darksylinc

FYI @scpeters, @jacobperron, @mogumbo. See workaround mentioned above about using an empty object with large AABB.

I could dedicate more time to this; but I'd have to try to compile a newer Ogre 1.x version from source and see if we can get OpenGL 3.x rendersystem running so that it can run on RenderDoc.

Gazebo-classic is using libogre-1.9-dev on ubuntu. I don't know if the issue is still present in newer version of Ogre 1.x or not. @scpeters, is there a preference to fix this issue from within Gazebo-classic, e.g. by overriding ogre classes, etc, or is it fine to use a separate fork of ogre 1.9 with shadow fixes?

iche033 avatar Sep 19 '22 20:09 iche033

Great detective work @darksylinc. (I remember debugging almost the same offscreen caster problem in OSG about 15 years ago.) As a user who expects to run into this problem with other projects, fixing in Gazebo-classic is preferable. Although, all projects will probably move on to new Gazebo within a few years. With any luck, the fix could be part of PSSM overrides that we have already made in Gazebo. (Those overrides were due to y-up vs z-up difference between Ogre and Gazebo, trouble with LisPSM shadows, and some other details.)

In our current project there's a preference for crisp shadows close to the camera, so if the workaround hurts shadow quality too much it could be a problem for us. I'd have to try it in our scene and see how it looks.

mogumbo avatar Sep 19 '22 21:09 mogumbo

I am unable to work around the problem by adding a large object to shadow_coverage.world. But I'm happy to test more if it helps. I think I would need a better defined test.

mogumbo avatar Sep 21 '22 18:09 mogumbo

I could dedicate more time to this; but I'd have to try to compile a newer Ogre 1.x version from source and see if we can get OpenGL 3.x rendersystem running so that it can run on RenderDoc.

Gazebo-classic is using libogre-1.9-dev on ubuntu. I don't know if the issue is still present in newer version of Ogre 1.x or not. @scpeters, is there a preference to fix this issue from within Gazebo-classic, e.g. by overriding ogre classes, etc, or is it fine to use a separate fork of ogre 1.9 with shadow fixes?

As a user who expects to run into this problem with other projects, fixing in Gazebo-classic is preferable

for this current project, we are building our own version of Ogre from source, but as @mogumbo says, it would be useful to more users to fix it within gazebo-classic, subject to our constraints on preserving API and ABI. Is there a significant difference in the difficulty of either approach?

scpeters avatar Sep 21 '22 23:09 scpeters

TL;DR

I fixed it!

The patch is extremely simple and doesn't break ABI:

diff --git a/media/rtshaderlib/SGXLib_IntegratedPSSM.glsl b/media/rtshaderlib/SGXLib_IntegratedPSSM.glsl
index 24cf275a8d..7a16f78c3a 100644
--- a/media/rtshaderlib/SGXLib_IntegratedPSSM.glsl
+++ b/media/rtshaderlib/SGXLib_IntegratedPSSM.glsl
@@ -78,7 +78,7 @@ void SGX_ApplyShadowFactor_Diffuse(in vec4 ambient,
 float _SGX_ShadowPoisson9(sampler2DShadow shadowMap, vec4 shadowMapPos, vec2 invShadowMapSize, bool hardwarePCF)
 {
   // Remove shadow outside shadow maps so that all that area appears lit
-  if (shadowMapPos.z < 0.0 || shadowMapPos.z > 1.0)
+  if (shadowMapPos.z < -1.0)
     return 1.0;
 
   float shadow = 0.0;
diff --git a/media/rtshaderlib150/SGXLib_IntegratedPSSM.glsl b/media/rtshaderlib150/SGXLib_IntegratedPSSM.glsl
index e72adc68ca..f9e38b9cfe 100644
--- a/media/rtshaderlib150/SGXLib_IntegratedPSSM.glsl
+++ b/media/rtshaderlib150/SGXLib_IntegratedPSSM.glsl
@@ -58,7 +58,7 @@ void SGX_ApplyShadowFactor_Diffuse(in vec4 ambient,
 float _SGX_ShadowPoisson9(sampler2DShadow shadowMap, vec4 shadowMapPos, vec2 invShadowMapSize)
 {
   // Remove shadow outside shadow maps so that all that area appears lit
-  if (shadowMapPos.z < 0.0 || shadowMapPos.z > 1.0)
+  if (shadowMapPos.z < -1.0)
     return 1.0;
 
   float shadow = 0.0;

Short version

Gazebo added a new routine _SGX_ShadowPoisson9 to improve shadow map quality, but it was buggy.

Ogre 1.9 uses a R32_FLOAT colour texture (NOT a depth texture) to hold the shadow map. Although the depth texture is always in range [0; 1], even in OpenGL, yet the vertex shader must output the depth in range [-1; 1]; thus the colour texture is in range [-1; 1].

Additionally, the routine assumes shadowMapPos.z > 1.0 means there should be no shadow. This is often wrong depending on how the shadow map is implemented. From what I could see, this is wrong for Gazebo/Ogre 1.9's case.

Probably more testing is needed to check nothing else gets broken; but as far as I can see, the correct statement is if (shadowMapPos.z < -1.0) and if this breaks something else, then it's quite likely that something else was broken in some other way to begin with and the bug was hiding it.

Long (useless info) version

Aarghhhhh!!! I think around 28hs must be put to debug this only to boil down to a single line of code!!!

As I mentioned, I backported PSSM code from OgreNext back to Ogre. It wasn't working, so I decided to use it in an Ogre1 sample where I could actually debug it with RenderDoc.

To my surprise... it was working!!! BUT, it was on Ogre master branch, not 1.9. I then tried Ogre 1.9 but alas, out of the box 1.9 was too broken that RTSS didn't even compile correctly on my machine. Therefore I could not quickly test the PSSM implementation on an Ogre 1.9 sample.

After comparing side-by-side debugging my PSSM implementation with near-identical conditions in Ogre master vs gazebo-classic, I nailed it down to a simple mistake of mine where my implementation needed the SceneNode (parent of the Light) to have proper orientation; which is required in Ogre1 master and OgreNext, but not required in Ogre 1.9.

Once I fixed that my backport starting working on gazebo-classic. I was about to call it a day until I noticed the bug was back. This time from a different angle. Which is strange because my PSSM code in Ogre master didn't seem to have this bug.

At some point I noticed the:

if (shadowMapPos.z < 0.0 || shadowMapPos.z > 1.0)

but decided that the correct code would be:

if (shadowMapPos.z < 0.0)

This fixed some of the problems, but not all of them. So I continued.

What made me completely stumble was at some point I noticed when I changed my code to:

// from:
shadowTexCamera->setFarClipDistance( farPlane );
// to:
shadowTexCamera->setFarClipDistance( farPlane * 2.0f );

All shadows disappeared. Which made absolutely no sense. Under no circumstance this should cause shadows to disappear (unless far plane becomes so ridiculously big it causes floating point precision problems). At worst it would look the same as before, at best some of the missing shadows would begin to appear. (Spoiler alert: increasing far plane range caused depth values to be < 0, that's why it was failing).

Unfortunately, PSSM cascade debugging was added later in Ogre1, and it was not available in 1.9.x; that could have helped a lot.

At some point while having lunch I realized I could use good ol' apitrace.

A now-ancient tool fallen into oblivion by most; but it works with legacy OpenGL. And it did work with Gazebo!

Its UI/UX is a bit crude, but I was able to rule out various possible problems I was worried of. Fortunately it has the ability to inspect textures pixel by pixel, like RenderDoc.

At some point I noticed some of the depth values were -0.066721.... wait. What? Negative values? OpenGL depth buffers receive input in [-1; 1] but GPUs still store depth in [0; 1]. How is that possible???

Upon further the depth texture wasn't a depth texture at all, it was a colour texture. That's when it hit me and everything made sense. I had already encountered the if (shadowMapPos.z < 0.0 || shadowMapPos.z > 1.0) so that came immediately to mind.

I fix it and... it wasn't working. At some point later on I noticed after going back and forth with apitrace that it was extremely likely my PSSM backport was broken in some subtle way. I then tried the original PSSM implementation. AND VOILÁ! EVERYTHING WORKS!!!

Morale of the story: Gazebo classic uses ancients APIs. In fact Ogre 1.9 seems to be making incorrect OpenGL calls but still manages to work (GL is a quite resilient API after all). These bugs don't seem to be present with Ogre master, but unfortunately gazebo with Ogre 1.11 (not master) boots into a black screen (after workarounding several crashes). In hindsight perhaps the problem may have been that Gazebo provides its own RTSS shader files instead of using Ogre's; and Gazebo's were not compatible "as is" with Ogre RTSS 1.11.

Ancient APIs are hard to debug. This bug would've probably taken me a few hours at most with modern tools and a more recent version of Ogre.

Small note:

If gazebo-classic is ever updated to a newer Ogre version; it is likely that my patch will break things because it is likely Ogre1 will start using a real depth texture instead of emulating it with a colour texture.

darksylinc avatar Oct 03 '22 01:10 darksylinc

Confirmed this works. if (shadowMapPos.z < 0.0 || shadowMapPos.z > 1.0) is my line of code, so I'm sorry for not testing it better. I'm sure I was assuming a depth texture range of [0, 1]. If using R32_FLOAT for the shadow textures is non-standard, then I recommend commenting in the shader why you're testing against -1 instead of 0.

And it looks to me like we should keep || shadowMapPos.z > 1.0. Without this test if you pull the camera back to the shadowFar distance you'll see shadow where there shouldn't be any.

mogumbo avatar Oct 03 '22 17:10 mogumbo

Confirmed this works. if (shadowMapPos.z < 0.0 || shadowMapPos.z > 1.0) is my line of code, so I'm sorry for not testing it better. I'm sure I was assuming a depth texture range of [0, 1]. If using R32_FLOAT for the shadow textures is non-standard, then I recommend commenting in the shader why you're testing against -1 instead of 0.

Indeed, pointing it out makes perfect sense. In fact if the D3D9/11 RenderSystems are used, then the range is [0; 1].

This is something that could be fixed somewhere else to provide a range of [0; 1]. However that may break other stuff that does expect [-1; 1].

And it looks to me like we should keep || shadowMapPos.z > 1.0. Without this test if you pull the camera back to the shadowFar distance you'll see shadow where there shouldn't be any.

No, that's wrong. shadowMapPos is in light space.

Diagram

In this diagram, the green circle receiver is behind the shadow camera's far plane; however it should still be shadowed because of the orange caster.

Thus shadowMapPos.z could be 10.0f and it would still not matter, because it is still shadowed by the orange square caster.

EDIT: Ahhhh, there's the issue of the clear colour. If the shadow map's max value is 1.0 (no casters), then 1 < 10; thus it would be in shadow. EDIT: Yeah, shadowMapPos.z > 1 must be preserved.

darksylinc avatar Oct 03 '22 18:10 darksylinc

If I'm following along correctly, I think the following patch is preferred:

diff --git a/media/rtshaderlib/SGXLib_IntegratedPSSM.glsl b/media/rtshaderlib/SGXLib_IntegratedPSSM.glsl
index 24cf275a8d..719fe639b8 100644
--- a/media/rtshaderlib/SGXLib_IntegratedPSSM.glsl
+++ b/media/rtshaderlib/SGXLib_IntegratedPSSM.glsl
@@ -78,7 +78,7 @@ void SGX_ApplyShadowFactor_Diffuse(in vec4 ambient,
 float _SGX_ShadowPoisson9(sampler2DShadow shadowMap, vec4 shadowMapPos, vec2 invShadowMapSize, bool hardwarePCF)
 {
   // Remove shadow outside shadow maps so that all that area appears lit
-  if (shadowMapPos.z < 0.0 || shadowMapPos.z > 1.0)
+  if (shadowMapPos.z < -1.0 || shadowMapPos.z > 1.0)
     return 1.0;
 
   float shadow = 0.0;
diff --git a/media/rtshaderlib150/SGXLib_IntegratedPSSM.glsl b/media/rtshaderlib150/SGXLib_IntegratedPSSM.glsl
index e72adc68ca..1fc7002e9c 100644
--- a/media/rtshaderlib150/SGXLib_IntegratedPSSM.glsl
+++ b/media/rtshaderlib150/SGXLib_IntegratedPSSM.glsl
@@ -58,7 +58,7 @@ void SGX_ApplyShadowFactor_Diffuse(in vec4 ambient,
 float _SGX_ShadowPoisson9(sampler2DShadow shadowMap, vec4 shadowMapPos, vec2 invShadowMapSize)
 {
   // Remove shadow outside shadow maps so that all that area appears lit
-  if (shadowMapPos.z < 0.0 || shadowMapPos.z > 1.0)
+  if (shadowMapPos.z < -1.0 || shadowMapPos.z > 1.0)
     return 1.0;
 
   float shadow = 0.0;

I don't understand shaders, so I don't know how to write an intelligent commit message for this. @darksylinc do you mind opening a pull request for this patch so we can continue review?

scpeters avatar Oct 03 '22 21:10 scpeters