gsplat icon indicating copy to clipboard operation
gsplat copied to clipboard

Depth Map

Open rosejn opened this issue 2 years ago • 7 comments

I see that the per gaussian depths are returned, but how about (optionally?) also returning a fully rasterized depth map so we can use a depth based loss as well during training?

Thanks for writing gsplat! It's a great implementation.

rosejn avatar Oct 20 '23 02:10 rosejn

Hey @rosejn, here is a quick hack that lets you render depths:

depth_out = RasterizeGaussians.apply(
                    self.xys,
                    self.depths,
                    self.radii,
                    self.conics,
                    self.num_tiles_hit,
                    self.depths[:, None].repeat(1, 3),
                    torch.sigmoid(self.opacities),
                    self.H,
                    self.W,
                )[..., 0:1]

maturk avatar Oct 22 '23 18:10 maturk

Hey @rosejn, here is a quick hack that lets you render depths:

depth_out = RasterizeGaussians.apply(
                    self.xys,
                    self.depths,
                    self.radii,
                    self.conics,
                    self.num_tiles_hit,
                    self.depths[:, None].repeat(1, 3),
                    torch.sigmoid(self.opacities),
                    self.H,
                    self.W,
                )[..., 0:1]

i found that this doesn't really get you the depth due to the varying transmittance of the splat and alpha compositing. Ideally you would use min blending, but you can approximate min by transforming to exponential space first (since log(e^x + e^y) approximately equals max(x, y)) to get something closer. Still not correct though, ideally you need to compute the expected ray termination depth.

cat-state avatar Oct 22 '23 23:10 cat-state

@cat-state, any ideas how to do the expected ray termination with these gaussians? In the nerfstudio implementation, the expected depth just falls out from the rendering equation (accumulated weights) for each pixel. Here we have z depth of sparse points basically, but how to blend them correctly to make a continuous depth map?

maturk avatar Oct 23 '23 07:10 maturk

@cat-state, any ideas how to do the expected ray termination with these gaussians? In the nerfstudio implementation, the expected depth just falls out from the rendering equation (accumulated weights) for each pixel. Here we have z depth of sparse points basically, but how to blend them correctly to make a continuous depth map?

I asked one of the authors as they'd mentioned expected ray termination in a tweet and the method they described is what you're already doing, so I was wrong there.

However I think that the results produced by it are "off".

Here is a sphereish blob of opacity = 1 gaussians: Screenshot 2023-10-23 at 15 48 03

When rendered with the "accumulate depth like color" method. Notice how the sphere looks "domed in": Screenshot 2023-10-23 at 15 48 14

With softmin blending (also wrong, but arguably less wrong) (like log(sum(exp(depth.max() - depth) * alpha * T)): Screenshot 2023-10-23 at 15 49 00

cat-state avatar Oct 23 '23 14:10 cat-state

Since multiple gaussians are blended to produce each pixel I think the right way to predict depth is likely to use the same factor that is multiplied by the color values to avg the depths of each constituent gaussian. If I'm understanding the implementation correctly, that would be here:

https://github.com/nerfstudio-project/gsplat/blob/main/gsplat/cuda/csrc/forward.cu#L548

This is still a bit of an estimate because I don't think it would really be taking into account the rotation, but in most cases the rotation will be oriented such that neighboring gaussians point to each other so this will likely produce the same depth. On the other hand this is virtually free to produce without any additional computation. I'm trying it out in a fork and I'll share a link as soon as I have it running. Any thoughts or feedback would be appreciated.

rosejn avatar Oct 27 '23 21:10 rosejn

Since multiple gaussians are blended to produce each pixel I think the right way to predict depth is likely to use the same factor that is multiplied by the color values to avg the depths of each constituent gaussian. If I'm understanding the implementation correctly, that would be here:

https://github.com/nerfstudio-project/gsplat/blob/main/gsplat/cuda/csrc/forward.cu#L548

This is still a bit of an estimate because I don't think it would really be taking into account the rotation, but in most cases the rotation will be oriented such that neighboring gaussians point to each other so this will likely produce the same depth. On the other hand this is virtually free to produce without any additional computation. I'm trying it out in a fork and I'll share a link as soon as I have it running. Any thoughts or feedback would be appreciated.

Yeah I agree, the above code does this blending (similar to color blending) for depth values and the orientation is taken into account with the conics params, just like color blending.

maturk avatar Oct 28 '23 08:10 maturk

Here's a first shot at rendering depth in the forward pass. In the image I've reverse projected a point cloud from synthetic depth values and then rendered it after just a little bit of training so you can still see points that haven't grown large enough yet in the distance. If a pixel hasn't intersected with any gaussian then I set the depth to zero so it is undefined as would be typical with a depth sensor.

image

The updates are in a fork here, and while I've tried to update where appropriate I haven't tested or verified anything yet beyond seeing that this depth looks reasonable given the scene. In terms of the interface this only impacts RasterizeGaussians by adding the point depths as an input argument and the rendered_depth as a second output argument.

https://github.com/originrose/gsplat/tree/depth

If anyone with more knowledge of the code base wants to help finish the backward pass that would be great. I haven't looked into it yet.

rosejn avatar Oct 28 '23 21:10 rosejn