ZoeDepth icon indicating copy to clipboard operation
ZoeDepth copied to clipboard

Pixel to 3D Point

Open christuchez opened this issue 2 years ago • 6 comments

If I take an image, generate the depth map, then generate 3D points how can I map a specific 2D pixel to a 3D value? For example if I have pixel (34, 56) in my original image then in the depth map it will still be (34,56) so I can get the depth at that pixel value but how can I get the values from the 3D mesh?

christuchez avatar Sep 08 '23 17:09 christuchez

We started asking this question also here #10 but we could not find an answer yet.

Teifoc avatar Sep 16 '23 11:09 Teifoc

What I did was output the values to a binary file. You can then read the file to find the values.

` # Estimate depth directly from PIL image running on GPU depth_data = model.infer_pil(image, output_type="tensor")

    # Move to CPU and convert to float32
    depth_data_cpu = depth_data.cpu().type(torch.float32)

    # Convert to numpy array and flatten
    depth_data_numpy = depth_data_cpu.numpy()

    #combine all rows 
    depth_data_flat = depth_data_numpy.flatten() 

    # Output binary file path
    output_path = os.path.join(image_directory, f"depth.bin")

    # Write depth data to binary file
    with open(output_path, 'wb') as file:
        file.write(depth_data_flat.tobytes())

`

michaeloder avatar Nov 22 '23 01:11 michaeloder

What I did was output the values to a binary file. You can then read the file to find the values.

` # Estimate depth directly from PIL image running on GPU depth_data = model.infer_pil(image, output_type="tensor")

    # Move to CPU and convert to float32
    depth_data_cpu = depth_data.cpu().type(torch.float32)

    # Convert to numpy array and flatten
    depth_data_numpy = depth_data_cpu.numpy()

    #combine all rows 
    depth_data_flat = depth_data_numpy.flatten() 

    # Output binary file path
    output_path = os.path.join(image_directory, f"depth.bin")

    # Write depth data to binary file
    with open(output_path, 'wb') as file:
        file.write(depth_data_flat.tobytes())

`

It is only the depth. I think what he wants is the corresponding 3D coordinate of the pixel. That is also what I am looking for. Do we have any solution for it?

toannguyen1904 avatar Nov 28 '23 03:11 toannguyen1904

For the x and y points you just need the pixel location and projection factor for x and y.

Unfortunately, the projection factors are specific to the image and camera used to take it, so if you don't know them, you'll need to tweak until they look right.

z = value you read x = z * projectionFactor.x * (pixel.x - center.x)/width y = z * projectionFactor.y * (pixel.y - center.y)/height

For example: The image is 192x384, so the center is 96x192. projectionFactor = (1.1,1.2);

If you read pixel (12,23) with a z = 5.1m

x = 5.1 * 1.1 * (12-96)/192 = -2.45 y = 5.1 * 1.2 * (23-192)/384 = -2.69

The positions are in camera space.

michaeloder avatar Nov 28 '23 19:11 michaeloder

projectionFactor

How do I find projectionFactor

nguyenbamanh1007 avatar Jun 29 '24 06:06 nguyenbamanh1007

projectionFactor

How do I find projectionFactor

You can use the focal length instead of projectionFactor, here is the modified code:

z = depth value you read
x = z * (pixel_x - center_x)/f_x
y = z * (pixel_y - center_y)/f_y

where

f_x: focal length along x-axis f_y: focal length along y-axis

mikami520 avatar Aug 11 '24 02:08 mikami520