Depth-Anything true metric depth values

Hi @LiheYoung ,

This is super impressive work. I used the huggingface deployment to test out the network. I gave it a sample image from a camera with known camera intrinsics and it output a depth map(consider it as disparity as it says on huggingface). I see per pixel values of the depth/disparity map but I do not know how to go about extracting per pixel true metric depth from these. Are the depth maps relative or are they true metric? If they are true metric, then how can I go about extracting per pixel metric depth?

Jan 25 '24 16:01 abhishekmonogram

Hi @LiheYoung, I also have the same query, how do i get metric depth information from the disparity maps? The information under metric_depth is only evaluation results and not the true depth values. I would really appreciate if you could share your insights on this.

BTW you guys did a really amazing job here.

Jan 25 '24 23:01 Abubakar17

I believe using the "pred" model output from the evaluate.py script https://github.com/LiheYoung/Depth-Anything/blob/5935968f82018d68fff44946573d34cdf27db827/metric_depth/evaluate.py#L80 (assuming you assign the correct focal length in the line above the model output) and using this https://github.com/LiheYoung/Depth-Anything/blob/main/metric_depth/zoedepth/utils/geometry.py should be all you need.

Since based on the ZoeDepth training pipeline the model output is metric depth in units of meters.

Jan 26 '24 00:01 loevlie

Hi @abhishekmonogram and @Abubakar17, the demo on the huggingface only outputs the relative depth (disparity), rather than the metric depth. As @loevlie mentioned, if you hope to obtain metric depth values, please refer to: https://github.com/LiheYoung/Depth-Anything/tree/main/metric_depth. Also, you may refer to the files @loevlie mentioned.

Jan 26 '24 02:01 LiheYoung

Thank you @loevlie for providing with those resources. The evaluate function is still to evaluate only on a custom dataset like NYU right. Do you know if there is script that directly does inference on any custom image?

@LiheYoung If huggingface outputs only disparity, how do I get the depth map from it? Because to get the depth, you also need the baseline, which is missing in case of monocular cameras.

Also could you comment on the accuracy of the per pixel true metric depth when you fine tuned on your own dataset. I read through the table in the paper, but it found it a little bit confusing to interpret those metrics.

Jan 26 '24 15:01 abhishekmonogram

Hey @abhishekmonogram, I got the evaluate function to work on my own dataset by mainly following this article. I might write a script to run inference on a custom image, if I do I will share it.

Jan 26 '24 15:01 loevlie

Screenshot 2024-01-26 at 10 21 10 PM

Pretty good on visualisation. @LiheYoung is it possible to confirm that the depths say 4.54 is in metres and there are no additional scales at play?

Jan 26 '24 16:01 1ssb

I believe it would be extremely helpful to have a function that accepts an image path and a focal length (with the current default value) as inputs, and then generates a depth map with metric values. It can be quite challenging for someone who isn't deeply involved in this specific field to create such a method.

Jan 26 '24 17:01 shizurumaya

I will update with my code right here, just waiting for the confirmation of the author on the correctness of scale.

On Fri, 26 Jan, 2024, 11:25 pm shizurumaya, @.***> wrote:

I believe it would be extremely helpful to have a function that accepts an image path and a focal length (with the current default value) as inputs, and then generates a depth map with metric values. It can be quite challenging for someone who isn't deeply involved in this specific field to create such a method.

— Reply to this email directly, view it on GitHub https://github.com/LiheYoung/Depth-Anything/issues/36#issuecomment-1912458399, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEE4BVGHIQYH6P3GYC3YQPUZNAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJSGQ2TQMZZHE . You are receiving this because you commented.Message ID: @.***>

Jan 27 '24 04:01 1ssb

Hi @1ssb , our Depth Anything models primarily focus on relative depth estimation. Thus, the output value from the HuggingFace published models does not represent any metric meanings. However, if you want to obtain metric depth information (in meters), you can use our models introduced here: https://github.com/LiheYoung/Depth-Anything/tree/main/metric_depth, just like @loevlie mentioned.

Jan 27 '24 04:01 LiheYoung

Hi @LiheYoung I am indeed using the metric depth and the point cloud I have uploaded is indeed from the zoedepth. Can you kindly confirm that if these values of depth are for example 4.35 metres etc, they are indeed in metres wothout any need for further analysis/transformation?

On Sat, 27 Jan, 2024, 10:15 am Lihe Yang, @.***> wrote:

Hi @1ssb https://github.com/1ssb , our Depth Anything models primarily focus on relative depth estimation. Thus, the output value from the HuggingFace published models does not represent any metric meanings. However, if you want to obtain metric depth information (in meters), you can use our models introduced here: https://github.com/LiheYoung/Depth-Anything/tree/main/metric_depth, just like @loevlie https://github.com/loevlie mentioned.

— Reply to this email directly, view it on GitHub https://github.com/LiheYoung/Depth-Anything/issues/36#issuecomment-1912987120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEDBCBTCPDR2TMZLWYDYQSA7ZAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJSHE4DOMJSGA . You are receiving this because you were mentioned.Message ID: @.***>

Jan 27 '24 04:01 1ssb

Yes, they are indeed in meters.

Jan 27 '24 06:01 LiheYoung

Ok here is my code, let me know if you find any glitches, @LiheYoung you can integrate this file as a commit with changes if you would like so.

Edit: Revised, updated and simplified code which can handle any output size.

# infer.py
# Code by @1ssb
import argparse
import os
import glob
import torch
import numpy as np
from PIL import Image
import torchvision.transforms as transforms
import open3d as o3d
from tqdm import tqdm
from zoedepth.models.builder import build_model
from zoedepth.utils.config import get_config

# Global settings
FL = 715.0873
FY = 256 * 0.6
FX = 256 * 0.6
NYU_DATA = False
FINAL_HEIGHT = 256
FINAL_WIDTH = 256
P_x, P_y = 128, 128
INPUT_DIR = './my_test/input'
OUTPUT_DIR = './my_test/output'
DATASET = 'nyu' # Lets not pick a fight with the model's dataloader

def process_images(model):
    if not os.path.exists(OUTPUT_DIR):
        os.makedirs(OUTPUT_DIR)

    image_paths = glob.glob(os.path.join(INPUT_DIR, '*.png')) + glob.glob(os.path.join(INPUT_DIR, '*.jpg'))
    for image_path in tqdm(image_paths, desc="Processing Images"):
        try:
            color_image = Image.open(image_path).convert('RGB')
            original_width, original_height = color_image.size
            image_tensor = transforms.ToTensor()(color_image).unsqueeze(0).to('cuda' if torch.cuda.is_available() else 'cpu')

            pred = model(image_tensor, dataset=DATASET)
            if isinstance(pred, dict):
                pred = pred.get('metric_depth', pred.get('out'))
            elif isinstance(pred, (list, tuple)):
                pred = pred[-1]
            pred = pred.squeeze().detach().cpu().numpy()

            # Resize color image and depth to final size
            resized_color_image = color_image.resize((FINAL_WIDTH, FINAL_HEIGHT), Image.LANCZOS)
            resized_pred = Image.fromarray(pred).resize((FINAL_WIDTH, FINAL_HEIGHT), Image.NEAREST)

            focal_length_x, focal_length_y = (FX, FY) if not NYU_DATA else (FL, FL)
            x, y = np.meshgrid(np.arange(FINAL_WIDTH), np.arange(FINAL_HEIGHT))
            x = (x - P_x) / focal_length_x
            y = (y - P_y) / focal_length_y
            z = np.array(resized_pred)
            points = np.stack((np.multiply(x, z), np.multiply(y, z), z), axis=-1).reshape(-1, 3)
            colors = np.array(resized_color_image).reshape(-1, 3) / 255.0

            pcd = o3d.geometry.PointCloud()
            pcd.points = o3d.utility.Vector3dVector(points)
            pcd.colors = o3d.utility.Vector3dVector(colors)
            o3d.io.write_point_cloud(os.path.join(OUTPUT_DIR, os.path.splitext(os.path.basename(image_path))[0] + ".ply"), pcd)
        except Exception as e:
            print(f"Error processing {image_path}: {e}")

def main(model_name, pretrained_resource):
    config = get_config(model_name, "eval", DATASET)
    config.pretrained_resource = pretrained_resource
    model = build_model(config).to('cuda' if torch.cuda.is_available() else 'cpu')
    model.eval()
    process_images(model)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("-m", "--model", type=str, default='zoedepth', help="Name of the model to test")
    parser.add_argument("-p", "--pretrained_resource", type=str, default='local::./checkpoints/depth_anything_metric_depth_indoor.pt', help="Pretrained resource to use for fetching weights.")

    args = parser.parse_args()
    main(args.model, args.pretrained_resource)

Jan 27 '24 06:01 1ssb

Ok here is my code, let me know if you find any glitches, @LiheYoung you can integrate this file asa commit with changes if you would like so.

# infer.py
# Code by @1ssb

import argparse
from tqdm import tqdm
import os, glob, torch
from PIL import Image
import torchvision.transforms as transforms
import numpy as np
import open3d as o3d
from zoedepth.models.builder import build_model
from zoedepth.utils.config import get_config

# Focal length settings
FL = 715.0873  # Default focal length, used if NYU_DATA is False
FY = 234.72  # Focal length in Y-axis
FX = 307.2   # Focal length in X-axis
NYU_DATA = False  # Flag to indicate if NYU data-specific settings are used

def infer(model, image, dataset):
    """
    Performs model inference on a single image.
    
    Args:
        model (torch.nn.Module): The depth estimation model.
        image (torch.Tensor): The input image tensor.
        dataset (str): The name of the dataset being used.

    Returns:
        torch.Tensor: Predicted depth map.
    """
    pred = model(image, dataset=dataset)
    return pred

def get_depth_from_prediction(pred):
    """
    Extracts the depth map from model prediction.

    Args:
        pred (torch.Tensor | list | tuple | dict): Model prediction.

    Returns:
        torch.Tensor: Extracted depth map.
    """
    if isinstance(pred, torch.Tensor):
        return pred
    elif isinstance(pred, (list, tuple)):
        return pred[-1]
    elif isinstance(pred, dict):
        return pred.get('metric_depth', pred.get('out'))
    else:
        raise TypeError(f"Unknown output type {type(pred)}")

def depth_to_point_cloud(depth, color_image):
    """
    Converts a depth map and a color image to a 3D point cloud.

    Args:
        depth (numpy.ndarray): The depth map.
        color_image (PIL.Image): The color image.

    Returns:
        tuple: Tuple containing points and colors for the point cloud.
    """
    height, width = depth.shape
    color_image = color_image.resize((width, height))
    focal_length_x, focal_length_y = (FL, FL) if NYU_DATA else (FX, FY)

    x, y = np.meshgrid(np.arange(width), np.arange(height))
    x = (x - width / 2) / focal_length_x
    y = (y - height / 2) / focal_length_y

    z = depth
    x = np.multiply(x, z)
    y = np.multiply(y, z)

    points = np.stack((x, y, z), axis=-1).reshape(-1, 3)
    colors = np.array(color_image).reshape(-1, 3) / 255.0

    return points, colors

def process_image(model, image_path, output_dir, dataset):
    """
    Processes a single image, performs depth estimation, and saves the resulting point cloud.

    Args:
        model (torch.nn.Module): The depth estimation model.
        image_path (str): Path to the image file.
        output_dir (str): Directory to save the point cloud.
        dataset (str): The name of the dataset being used.
    """
    color_image = Image.open(image_path).convert('RGB')
    image_tensor = transforms.ToTensor()(color_image).unsqueeze(0).to('cuda' if torch.cuda.is_available() else 'cpu')

    pred_dict = infer(model, image_tensor, dataset)
    pred = get_depth_from_prediction(pred_dict).squeeze().detach().cpu().numpy()

    points, colors = depth_to_point_cloud(pred, color_image)
    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(points)
    pcd.colors = o3d.utility.Vector3dVector(colors)

    min_depth, max_depth = np.min(pred[pred > 0]), np.max(pred)
    print(f"Processed {image_path}: Min Depth: {min_depth}, Max Depth: {max_depth}")

    output_filename = os.path.join(output_dir, os.path.splitext(os.path.basename(image_path))[0] + ".ply")
    o3d.io.write_point_cloud(output_filename, pcd)

def main(config, input_dir, output_dir, dataset):
    """
    Main function to process all images in a directory.

    Args:
        config (dict): Configuration for the model.
        input_dir (str): Directory containing input images.
        output_dir (str): Directory to save point clouds.
        dataset (str): The name of the dataset being used.
    """
    model = build_model(config).to('cuda' if torch.cuda.is_available() else 'cpu')
    model.eval()

    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    image_paths = glob.glob(os.path.join(input_dir, '*.png')) + glob.glob(os.path.join(input_dir, '*.jpg'))
    if not image_paths:
        print("No images found in the input directory.")
        return

    for image_path in tqdm(image_paths, desc="Processing Images"):
        try:
            process_image(model, image_path, output_dir, dataset)
        except Exception as e:
            print(f"Error processing {image_path}: {e}")

def test_model(model_name, pretrained_resource, input_dir, output_dir, dataset):
    """
    Tests a model with given parameters.

    Args:
        model_name (str): The name of the model.
        pretrained_resource (str): Path to pretrained model weights.
        input_dir (str): Directory containing input images.
        output_dir (str): Directory to save point clouds.
        dataset (str): The name of the dataset being used.
    """
    config = get_config(model_name, "eval", dataset)
    if pretrained_resource:
        config.pretrained_resource = pretrained_resource
    main(config, input_dir, output_dir, dataset)

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="Depth estimation and point cloud generation script.")
    parser.add_argument("-m", "--model", type=str, default='zoedepth', help="Name of the model to test")
    parser.add_argument("-p", "--pretrained_resource", type=str, default='local::./checkpoints/depth_anything_metric_depth_indoor.pt', help="Pretrained resource to use for fetching weights.")
    parser.add_argument("-d", "--dataset", type=str, default='nyu', help="Dataset to evaluate on")
    parser.add_argument("-i", "--input_dir", type=str, default='./my_test/input', help="Input directory containing images")
    parser.add_argument("-o", "--output_dir", type=str, default='./my_test/output', help="Output directory for point clouds")
    args = parser.parse_args()

    test_model(args.model, args.pretrained_resource, args.input_dir, args.output_dir, args.dataset)

Hi @1ssb, thanks for the code. I found that whatever the original size of the image is, the output depth possesses the shape of (392, 518). Is there a method to obtain the depth information corresponding to the original image size? Does interpolation need to be performed here?

Jan 30 '24 13:01 yyvhang

Hi @yyvhang, please check the updated code.

Jan 30 '24 13:01 1ssb

Hi @yyvhang, please check the updated code.

Thanks!

Jan 31 '24 08:01 yyvhang

I believe using the "pred" model output from the evaluate.py script

https://github.com/LiheYoung/Depth-Anything/blob/5935968f82018d68fff44946573d34cdf27db827/metric_depth/evaluate.py#L80

(assuming you assign the correct focal length in the line above the model output) and using this https://github.com/LiheYoung/Depth-Anything/blob/main/metric_depth/zoedepth/utils/geometry.py should be all you need. Since based on the ZoeDepth training pipeline the model output is metric depth in units of meters.

Are you sure that the focal argument (focal=focal) is necessary or even does anything? I just cannot see it being used in any forward method of the metric depth models, but only in the eval dataloader, @LiheYoung, @loevlie

Also, @LiheYoung, how more accurate is inference with flip augmentation, default in the evaluation script?

Jan 31 '24 11:01 DiTo97

Ok here is my code, let me know if you find any glitches, @LiheYoung you can integrate this file asa commit with changes if you would like so.

Edit: Revised, updated and simplified code which can handle any output size.

# infer.py
# Code by @1ssb
import argparse
import os
import glob
import torch
import numpy as np
from PIL import Image
import torchvision.transforms as transforms
import open3d as o3d
from tqdm import tqdm
from zoedepth.models.builder import build_model
from zoedepth.utils.config import get_config

# Global settings
FL = 715.0873
FY = 256 * 0.6
FX = 256 * 0.6
NYU_DATA = False
FINAL_HEIGHT = 256
FINAL_WIDTH = 256
INPUT_DIR = './my_test/input'
OUTPUT_DIR = './my_test/output'
DATASET = 'nyu' # Lets not pick a fight with the model's dataloader

def process_images(model):
    if not os.path.exists(OUTPUT_DIR):
        os.makedirs(OUTPUT_DIR)

    image_paths = glob.glob(os.path.join(INPUT_DIR, '*.png')) + glob.glob(os.path.join(INPUT_DIR, '*.jpg'))
    for image_path in tqdm(image_paths, desc="Processing Images"):
        try:
            color_image = Image.open(image_path).convert('RGB')
            original_width, original_height = color_image.size
            image_tensor = transforms.ToTensor()(color_image).unsqueeze(0).to('cuda' if torch.cuda.is_available() else 'cpu')

            pred = model(image_tensor, dataset=DATASET)
            if isinstance(pred, dict):
                pred = pred.get('metric_depth', pred.get('out'))
            elif isinstance(pred, (list, tuple)):
                pred = pred[-1]
            pred = pred.squeeze().detach().cpu().numpy()

            # Resize color image and depth to final size
            resized_color_image = color_image.resize((FINAL_WIDTH, FINAL_HEIGHT), Image.LANCZOS)
            resized_pred = Image.fromarray(pred).resize((FINAL_WIDTH, FINAL_HEIGHT), Image.NEAREST)

            focal_length_x, focal_length_y = (FX, FY) if not NYU_DATA else (FL, FL)
            x, y = np.meshgrid(np.arange(FINAL_WIDTH), np.arange(FINAL_HEIGHT))
            x = (x - FINAL_WIDTH / 2) / focal_length_x
            y = (y - FINAL_HEIGHT / 2) / focal_length_y
            z = np.array(resized_pred)
            points = np.stack((np.multiply(x, z), np.multiply(y, z), z), axis=-1).reshape(-1, 3)
            colors = np.array(resized_color_image).reshape(-1, 3) / 255.0

            pcd = o3d.geometry.PointCloud()
            pcd.points = o3d.utility.Vector3dVector(points)
            pcd.colors = o3d.utility.Vector3dVector(colors)
            o3d.io.write_point_cloud(os.path.join(OUTPUT_DIR, os.path.splitext(os.path.basename(image_path))[0] + ".ply"), pcd)
        except Exception as e:
            print(f"Error processing {image_path}: {e}")

def main(model_name, pretrained_resource):
    config = get_config(model_name, "eval", DATASET)
    config.pretrained_resource = pretrained_resource
    model = build_model(config).to('cuda' if torch.cuda.is_available() else 'cpu')
    model.eval()
    process_images(model)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("-m", "--model", type=str, default='zoedepth', help="Name of the model to test")
    parser.add_argument("-p", "--pretrained_resource", type=str, default='local::./checkpoints/depth_anything_metric_depth_indoor.pt', help="Pretrained resource to use for fetching weights.")

    args = parser.parse_args()
    main(args.model, args.pretrained_resource)

@1ssb, in the original version of the code snippet versions you had different values for the focal lengths on the x, y axis, while it has been changed to them being equal to the final image shape times a scaling factor:

why changing and why fixing the scaling factor (fy = 256 * 0.6)?
how could we get the focal-adjusted metric depth map instead of the focal-adjusted point cloud?

Jan 31 '24 11:01 DiTo97

Hi @1ssb, thank you a lot for contributing this script! Would you mind making a pull request? You can put this file in our metric_depth folder and maybe name it as depth_to_pointcloud.py? I will merge it to our main branch ASAP.

Jan 31 '24 11:01 LiheYoung

Yeah sure, sending a pull request soon!

On Wed, 31 Jan, 2024, 10:22 pm Lihe Yang, @.***> wrote:

Hi @1ssb https://github.com/1ssb, thank you a lot for contributing this script! Would you mind making a pull request? You can put this file in our metric_depth https://github.com/LiheYoung/Depth-Anything/tree/main/metric_depth folder and maybe name it as depth_to_pointcloud.py? I will merge it to our main branch ASAP.

— Reply to this email directly, view it on GitHub https://github.com/LiheYoung/Depth-Anything/issues/36#issuecomment-1918910381, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEFUVC4O4NCANFR36KDYRISOLAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJYHEYTAMZYGE . You are receiving this because you were mentioned.Message ID: @.***>

Jan 31 '24 11:01 1ssb

Hi @DiTo97, "why changing and why fixing the scaling factor (fy = 256 * 0.6)?" this is specific to my application do not bother yourself with it.

"how could we get the focal-adjusted metric depth map instead of the focal-adjusted point cloud?" Instead of using the point cloud, normalise the tensor map of the RGBD and multiply with 255 to get a colormap. Utilise the script in the inference file to do this.

Jan 31 '24 14:01 1ssb

Hi @DiTo97, "why changing and why fixing the scaling factor (fy = 256 * 0.6)?" this is specific to my application do not bother yourself with it.

"how could we get the focal-adjusted metric depth map instead of the focal-adjusted point cloud?" Instead of using the point cloud, normalise the tensor map of the RGBD and multiply with 255 to get a colormap. Utilise the script in the inference file to do this.

good to know, @1ssb.

As for the second question, I meant if it is necessary to adjust the focal length of the generated metric depth map, or not. I see you resizing the depth map, and consequently re-scaling the map by the desired focal length, before projecting to a point cloud. Maybe it's related to your specific use case that you were mentioning, e.g., are the specific focal lengths you put in for x, y the focal lengths from the RGB camera intrinsics?

To sum up, if I provide the model with some RGB image, and want the generated metric depth map to have the same resolution, by interpolating afterwards, should I just do the interpolation or also re-scale the depth map values by the resolution change? In general, even if I didn't change the depth map resolution, could I use and trust those metric values as they are generated, or should I do some focal length re-scaling depending on the RGB camera device I am using?

Jan 31 '24 15:01 DiTo97

@DiTo97 Yes you are right and yes you can trust it anywhere. An RGBD is where for every pixel you have a depth. The resizing needs an interpolation, which I have already done for you in my script, so don't worry about it just appropriately update the global values. You need to simply adjust for the pixelwise depth to the depth in 3D (viewing from a point) hence the transformation.

Feb 01 '24 01:02 1ssb

@1ssb Thank you for your contribution! May I ask a question about the weight(xxx.pt) in your script is finetuning by your dataset? Could I load the original depthanything weight(xxxxx_vits14.pth)?

Feb 01 '24 03:02 zhongqiu1245

Hi @zhongqiu1245, I have not checked with any other pretrained resource. Remember that this model is following the zoedepth style and not the relative scale metric so the transformer weight vits.pth probably would not work if the output of the model has the last step as a layered norm (which in all probability is giving you an unsqueezed activated output) which will in general be followed by a normalisation somewhere. Sorry for the long sentences.

Feb 01 '24 03:02 1ssb

thanks @1ssb, great contribution. I tried your code and successfully got a .ply file, but how could I open it? Win10's default tool doesn't work with it.

Feb 01 '24 04:02 SilenceGoo

I am glad it does, make sure the file is not corrupted. I generally use Meshlab and it works well.

On Thu, 1 Feb, 2024, 3:21 pm John Doe, @.***> wrote:

thanks @1ssb https://github.com/1ssb, great contribution. I tried your code and successfully got a .ply file, but how could I open it? Win10's default tool doesn't work with it.

— Reply to this email directly, view it on GitHub https://github.com/LiheYoung/Depth-Anything/issues/36#issuecomment-1920477391, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEDUSZGS26LJJSG2YE3YRMJ4XAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRQGQ3TOMZZGE . You are receiving this because you were mentioned.Message ID: @.***>

Feb 01 '24 04:02 1ssb

thanks @1ssb again, my stupid, cloudcompare gave a shot.

Feb 01 '24 04:02 SilenceGoo

@1ssb thank you！

Feb 01 '24 11:02 zhongqiu1245

@1ssb I took 4 images by keeping the object at different distances from the camera (40cm,50cm,60cm and 70cm). I have the camera intrinsics (FY = 822.59804231766066 , FX = 838.14270160166848). When I use the NYU indoor checkpoint to test this with the code you provided, the point cloud metric distances are not as accurate as real true distances. Any thoughts on what could be going wrong?

40cm 50cm 60cm 70cm

Feb 01 '24 15:02 abhishekmonogram

For the NYU make sure you are setting nyu dataset flag to true for the correct focal length which I have directly provided in the script. If you are using your own fx and fy values, you will get a different distance from the origin as expected.

On Fri, 2 Feb, 2024, 2:48 am Abhishek Pavani, @.***> wrote:

@1ssb https://github.com/1ssb I took 4 images by keeping the object at different distances from the camera (40cm,50cm,60cm and 70cm). I have the camera intrinsics (FY = 822.59804231766066 , FX = 838.14270160166848). When I use the NYU indoor checkpoint to test this with the code you provided, the point cloud metric distances are not as accurate as real true distances. Any thoughts on what could be going wrong?

40cm.jpg (view on web) https://github.com/LiheYoung/Depth-Anything/assets/141050083/ce1e9fea-6823-432b-b5fd-87ab1773823e 50cm.jpg (view on web) https://github.com/LiheYoung/Depth-Anything/assets/141050083/8829f38a-d5ef-4560-bcbc-6200ac58211d 60cm.jpg (view on web) https://github.com/LiheYoung/Depth-Anything/assets/141050083/eb3b1234-faa2-4e26-8aec-4dd9af586152 70cm.jpg (view on web) https://github.com/LiheYoung/Depth-Anything/assets/141050083/462d5ee6-70f0-4f72-8bd1-74171e43605c

— Reply to this email directly, view it on GitHub https://github.com/LiheYoung/Depth-Anything/issues/36#issuecomment-1921632395, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFECF5SFOUV6KAKHUGK3YRO2MVAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRRGYZTEMZZGU . You are receiving this because you were mentioned.Message ID: @.***>

Feb 01 '24 20:02 1ssb

Depth-Anything Depth-Anything copied to clipboard

true metric depth values

Depth-Anything
Depth-Anything copied to clipboard