ZoeDepth icon indicating copy to clipboard operation
ZoeDepth copied to clipboard

Getting Metric Depth

Open talasalim opened this issue 1 year ago • 24 comments

How can I use my data to get the metric depth at a pixel level using the ZoeD model?

talasalim avatar Mar 09 '23 22:03 talasalim

Could you describe more what problems are you facing? The output of the model is the metric depth. If you think units are wildly inaccurate, try with config_mode=eval while loading the model. You can choose to use ZoeD_N for indoor scenes, ZoeD_K for outdoor road scenes, and ZoeD_NK for generic scenes

shariqfarooq123 avatar Mar 19 '23 13:03 shariqfarooq123

@shariqfarooq123 I think @talasalim means how to get to a metric distance (like e.g. meters) from two known x,y coordinates (or distance from Camera to object surface) of the original picture, when providing two x,y coordinates which are known to be a fixed length in the picture for calibartion.

Teifoc avatar Mar 24 '23 21:03 Teifoc

@shariqfarooq123 @Teifoc Yes that is what I meant. Is there a way to get the absolute metric depth at a certain x,y coordinate?

talasalim avatar Mar 28 '23 05:03 talasalim

Following up here. I think you might need to provide the camera intrinsics that are unique per camera but I'm assuming these are known for the dataset in question. @talasalim @shariqfarooq123 @Teifoc any ideas?

VibAltekar avatar Apr 04 '23 22:04 VibAltekar

Under the file geometry.py I found two functions get_intrinsics and depth_to_points. I think if we change depth_to_points to this, we can actually define the camera intrinsics and extrinsics as we want:

def depth_to_points(depth, K=None, R=None, t=None):

    if K is None:
        K = get_intrinsics(depth.shape[1], depth.shape[2])
    Kinv = np.linalg.inv(K)
    if R is None:
        R = np.eye(3)
    if t is None:
        t = np.zeros(3)

Yarroudh avatar May 27 '23 18:05 Yarroudh

Folowing on this, does somebody know which unit is used for the metric depth ? Comparing my results to ground truth data, ranging from 5 to 45 meters, i have values from 1200 to 8400 in my ZoeDepth output. Is this supposed to be millimeters ? Steps of 5 mm ?

Sivloc avatar Jun 28 '23 11:06 Sivloc

Hallo, sorry I'm quite a newbie here. so, are the numbers you were mentioning are the result from zoe.infer_pil(image)? and we can directly use that to know the estimation of the metric depth value? or is there any other steps to get that?

ariqhadi avatar Jul 11 '23 22:07 ariqhadi

although the model is trained to predict metric depth, due to the limited data size, I think the prediction is still not metric accurate, but should be scale aware (i.e. if an object is twice as far as another, even if the absolute depth is incorrect, the proportional should be the same). In short I think the number is still "up to some scale"

kwea123 avatar Jul 20 '23 06:07 kwea123

Honestly, i have pretty good results taking directly the ouptut from zoe.infer_pil(image) as millimeters, but some of these algorithms do provide an output equivalent to MetricDepth = Scale*OutputDepth + Shift, where scale and shift are dependant of your camera parameters. If you're not sure about that, you can use linear regression to estimate those parameters, given that you have ground truth.

Sivloc avatar Jul 20 '23 07:07 Sivloc

The model is trained to predict meters though

kwea123 avatar Jul 20 '23 07:07 kwea123

Could you describe more what problems are you facing? The output of the model is the metric depth. If you think units are wildly inaccurate, try with config_mode=eval while loading the model. You can choose to use ZoeD_N for indoor scenes, ZoeD_K for outdoor road scenes, and ZoeD_NK for generic scenes

Well it says that the output is metric, not meters right ? At least in my case, if the output is actually meters, it would be insanely inaccurate.

Sivloc avatar Jul 20 '23 07:07 Sivloc

the depth in training and eval is converted to meters: https://github.com/isl-org/ZoeDepth/blob/edb6daf45458569e24f50250ef1ed08c015f17a7/zoedepth/data/data_mono.py#L353-L354 https://github.com/isl-org/ZoeDepth/blob/edb6daf45458569e24f50250ef1ed08c015f17a7/zoedepth/data/ddad.py#L98 https://github.com/isl-org/ZoeDepth/blob/edb6daf45458569e24f50250ef1ed08c015f17a7/zoedepth/data/diml_indoor_test.py#L97-L98

kwea123 avatar Jul 20 '23 08:07 kwea123

As @kwea123 pointed out, the model was trained with meters as units for depth. So the output is always supposed to be in meters. However, the input padding in the infer and infer_pil API may easily change the overall scale of the output but should be more or less consistent.

Try turning the padding off with pad_input=False (at the cost of border artifacts, see zoedepth.models.depth_model:L57)

TLDR:

import torch

zoe = torch.hub.load("isl-org/ZoeDepth", "ZoeD_N", pretrained=True)
predicted_depth = zoe.infer_pil(image, pad_input=False)  # Better 'metric' accuracy

Let me know if this helps

shariqfarooq123 avatar Jul 20 '23 11:07 shariqfarooq123

Okay thanks a lot! I was actually using the save_raw_16bit function from misc.py, which multiply all values by 256.

def save_raw_16bit(depth, fpath="raw.png"):
    if isinstance(depth, torch.Tensor):
        depth = depth.squeeze().cpu().numpy()
    
    assert isinstance(depth, np.ndarray), "Depth must be a torch tensor or numpy array"
    assert depth.ndim == 2, "Depth must be 2D"
    depth = depth * 256  # scale for 16-bit png
    depth = depth.astype(np.uint16)
    depth = Image.fromarray(depth)
    depth.save(fpath)
    print("Saved raw depth to", fpath)

No wonder i had bad metrics while comparing to ground truth... Thanks for pointing that out!

Sivloc avatar Jul 20 '23 12:07 Sivloc

Interesting! So now are you able to reproduce the ground truth metric depth?

hpstyl avatar Jul 20 '23 12:07 hpstyl

Well it sure is better than before, but it stills struggle with the background of my ground truth. Here is what it looks like : comparison The background is ~30 meters farer than predicted. Also, i should mention that i used the zoedepth_nk model.

Sivloc avatar Jul 20 '23 12:07 Sivloc

Folowing on this, does somebody know which unit is used for the metric depth ? Comparing my results to ground truth data, ranging from 5 to 45 meters, i have values from 1200 to 8400 in my ZoeDepth output. Is this supposed to be millimeters ? Steps of 5 mm ?

If you look at the code of the utility function save_as_raw_16bit (or something like that ), you'll see they get the data , multiply it by 256 and round it off ro unsigned 16bit integere (so 0 - 65535) .

That means you can a) Use the raw data yourself , since it are floating point numbers that represent meters as far as I know (model can be off ofcourse).

Or b) read the raw 16bit integere in that you might already have , divide the values by 256 to get close to the original float output of the model.

The values you mention divided by 256 come closer to what you describe as the values you are looking for .

(Edit: upon reloading I now see there were already replies and this has been said before. Sorry . When I opened the issue that part of the discussion wasn't visible to me )

jorismak avatar Jul 24 '23 07:07 jorismak

Well it sure is better than before, but it stills struggle with the background of my ground truth. Here is what it looks like : comparison The background is ~30 meters farer than predicted. Also, i should mention that i used the zoedepth_nk model.

When I use the function save_raw_16bit, I only got a totally black picture. How do you get the real distance ? Which function do you use? Thank you for your answer!

GinRawin avatar Dec 04 '23 12:12 GinRawin

If using save_raw_16bit: You get back a greyscale image, in other words you get back width x height, and for every point a number between >= 0 and <= 65535. That is the 16bit integer range.

Divide that number by 256 to get what the model predicts as meters. Of course it depends on camera, model accuracy and upscaling and all that. But the numbers save_raw_16bit returns are meters multiplied by 256. So divide by 256 to get back some sort of meters.

jorismak avatar Dec 04 '23 12:12 jorismak

If using save_raw_16bit: You get back a greyscale image, in other words you get back width x height, and for every point a number between >= 0 and <= 65535. That is the 16bit integer range.

Divide that number by 256 to get what the model predicts as meters. Of course it depends on camera, model accuracy and upscaling and all that. But the numbers save_raw_16bit returns are meters multiplied by 256. So divide by 256 to get back some sort of meters.

Thank you for your help! My code was like this:

image = Image.open("image.png").convert("RGB")
model_zoe_n = torch.hub.load(".", "ZoeD_NK", pretrained=True, source="local")
DEVICE = "cuda:1" if torch.cuda.is_available() else "cpu"
zoe = model_zoe_n.to(DEVICE)
depth = zoe.infer_pil(image)

I find that the numbers save_raw_16bit returns are depth multiplied by 256.So I think the depth there should be the real distance of the photo? If I am right, the result is bad. Maybe the reason is that the camera is too close to the object in my photo. It is only about 20 cm far from my camera.

GinRawin avatar Dec 04 '23 13:12 GinRawin

Well it sure is better than before, but it stills struggle with the background of my ground truth.嗯,它确实比以前更好,但它仍然与我的基本事实背景相斗争。 Here is what it looks like : 它是这样的: comparison The background is ~30 meters farer than predicted. 背景比预计远约 30 米。 Also, i should mention that i used the zoedepth_nk model. 另外,我应该提到我使用了 zoedepth_nk 模型。

May I ask how you generated your result graph?

807xuan avatar Feb 01 '24 10:02 807xuan

Well it sure is better than before, but it stills struggle with the background of my ground truth. Here is what it looks like : comparison The background is ~30 meters farer than predicted. Also, i should mention that i used the zoedepth_nk model.

Hello, can you please tell me how you generate ground truth for an image? I want to compare too my predicted depth with ground truth. Thanks!

Flaviaaa123 avatar May 28 '24 14:05 Flaviaaa123

You can't generate the ground thruth, you have to actually measure it. You have two options (that i know of) :

  • You can use a RGBD camera that can measure depth
  • You can use softwares to simulate an RGBD acquisition from 3D models

Sivloc avatar Jun 06 '24 07:06 Sivloc

Well it sure is better than before, but it stills struggle with the background of my ground truth.嗯,它确实比以前更好,但它仍然与我的基本事实背景相斗争。 Here is what it looks like : 它是这样的: comparison The background is ~30 meters farer than predicted. 背景比预计远约 30 米。 Also, i should mention that i used the zoedepth_nk model. 另外,我应该提到我使用了 zoedepth_nk 模型。

May I ask how you generated your result graph?

Sorry, i just saw your question. Which result graph are you talking about ? For the 3 of them, i plotted the output matrix. I don't think i still have the code i used.

Sivloc avatar Jun 06 '24 07:06 Sivloc