gaussian-splatting Project a pixel accurately to 3D

trafficstars

Hi all,

I'm currently working on a project where I need to project a set of target pixels (those selected based on certain error metrics) into 3D space. The idea is to estimate their 3D locations and then retrieve nearby Gaussians within a certain radius. Since I have both camera intrinsics and extrinsics, my approach is to invert the standard projection pipeline — i.e., use the known pixel coordinates and depth values to back-project into 3D world coordinates.

Here's the function I'm using for this:

def _project_depth(self, u: int, v: int, depth: float, camera) -> np.ndarray:
        if depth <= 0:
          return None
  
        W, H = camera.image_width, camera.image_height
        #  intrinsics 
        fy =  fov2focal(camera.FoVy, camera.image_height)
        fx = fov2focal(camera.FoVx, camera.image_width)
        cx, cy = W/2.0, H/2.0

        # camera space
        x_cam = (u - cx) * depth / fx
        y_cam = -(v - cy) * depth / fy
        z_cam = -depth

        # to homogeneous and world
        cam_coords = torch.tensor([x_cam, y_cam, z_cam, 1.0],
                                dtype=torch.float32,
                                device=camera.data_device)
        cam_to_world = camera.world_view_transform.inverse().to(torch.float32)
        world = cam_to_world @ cam_coords
        return world[:3].cpu().numpy()

However, I'm noticing that the 3D points produced by this projection (shown in gray in this figure) are significantly misaligned from the actual Gaussians (shown in blue). I expected them to land close to the surface of the scene (where Gaussians are located), but instead they're often far off.

After some investigation, I suspect the problem may lie in the depth values I'm using. I'm currently extracting them from the Gaussian renderer. Looking into the forward.cu code, I noticed the depth is computed as: expected_invdepth += depths[collected_id[j]] * alpha * T; This is an alpha-blended inverse depth, which may not represent the true surface depth along the ray. I also experimented with using the depth of the first Gaussian hit with opacity > 0.1. That improved the projection a bit, but the result is still inaccurate.

At this point, I suspect that I may either be:

Misinterpreting the depth values from the renderer,
Missing a correction in the transformation chain, or
Misaligning coordinate conventions (e.g., sign conventions, handedness, etc.).

I'd really appreciate any insights or suggestions. Have I misunderstood how to derive true surface depth from the renderer? Or is there an error in how I back-project pixels into world space?

Thanks in advance.

Jul 13 '25 01:07 ali-john

I think it's caused by ‘misaligning coordinate convention’. The orientation of the coordinate axes in your implementation is as follows.

x-axis: right
y-axis: top
z-axis: back

However, the local camera coordinate system described in the COLMAP documentation is slightly different. Therefore, if you are using data in the COLMAP format, the axis order should be changed as follows.

x-axis : right
y-axis : bottom
z-axis : front

x_cam = (u - cx) * depth / fx y_cam = (v - cy) * depth / fy z_cam = depth

Jul 14 '25 12:07 Jwonno

@Jwonno Yes, that seems to be one of the issue as well. I changed my code as follow: from utils.graphics_utils import fov2focal, geom_transform_points

def _project_depth(self, u: int, v: int, depth: float, camera) -> np.ndarray:

       fx = fov2focal(cam.FoVx, cam.image_width)
       fy = fov2focal(cam.FoVy, cam.image_height)

       if hasattr(cam, 'cx') and hasattr(cam, 'cy'):
           original_w = 2044 * 2  
           scale_factor = cam.image_width / original_w
           cx = cam.cx * scale_factor
           cy = cam.cy * scale_factor
       else:
           cx = cam.image_width * 0.5
           cy = cam.image_height * 0.5

       # to camera space 
       x_cam = (u - cx) * depth / fx
       y_cam = (v - cy) * depth / fy
       z_cam = depth  # positive depth

       # to world coordinates
       pts_cam = torch.tensor([[x_cam, y_cam, z_cam]],
                               dtype=torch.float32, device=cam.data_device)

       # Camera-to-world transformation
       cam_to_world = cam.world_view_transform.inverse()
       pts_world = geom_transform_points(pts_cam, cam_to_world)

       return pts_world[0].detach().cpu().numpy()

After these changes, I have better results. As shown in this figure, the projected pixels are landing very close to the gaussians and are also in the same region. But using the same setting, sometimes I am still getting incorrect projections as shown in this figure. I am suspecting this might be due to incorrect depths. Because the depth estimates from renderer is not perfect anyway. If you have any insights, that would be great.

Thanks for your help in advance!

Jul 15 '25 05:07 ali-john

As far as I know, cam.world_view_transfrom is stored as the transpose of the World to Camera transformation. Therefore, to perform a Camera-wo-World transfomation, you need to compute cam_to_world = cam.world_view_transform.T.inverse().

Oct 22 '25 00:10 suhsuh1106

gaussian-splatting gaussian-splatting copied to clipboard

Project a pixel accurately to 3D

gaussian-splatting
gaussian-splatting copied to clipboard