Metric3D icon indicating copy to clipboard operation
Metric3D copied to clipboard

Supporting old GPUs?

Open haojiang95 opened this issue 9 months ago • 3 comments

Hello, thanks again for the great work! Your model uses torch.bfloat16 which is only supported by the newer GPUs. https://github.com/YvanYin/Metric3D/blob/7b5440dcbc17ef5e09805169a5f0b2d6bfe59161/mono/model/decode_heads/RAFTDepthNormalDPTDecoder5.py#L218-L229 May I ask you to kindly support older ones by adding an option to use torch.float32 instead? It could be as simple as dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float32, and use dtype in autocast

haojiang95 avatar May 02 '24 01:05 haojiang95

Is there a workaround for this error ? Do you know if replacing bfloat16 to torch.float enable the weights to load correctly ?

testingshanu avatar May 02 '24 11:05 testingshanu

Is there a workaround for this error ? Do you know if replacing bfloat16 to torch.float enable the weights to load correctly ?

It works on my side to the extent that the outputs look reasonable. I don't know any other workaround, and I'd be happy to know if there are better options.

haojiang95 avatar May 02 '24 20:05 haojiang95

Is there a workaround for this error ? Do you know if replacing bfloat16 to torch.float enable the weights to load correctly ?

I think the weights will be loaded correctly because float32 should be compatible with bfloat16.

JUGGHM avatar May 04 '24 06:05 JUGGHM

Hi, sorry for commenting on a closed issue. But if we are using the torch hub model, how do we modify it to use float?

# where to modify code below?
model = torch.hub.load('yvanyin/metric3d', 'metric3d_vit_small', pretrain=True)
model.cuda().eval()
with torch.no_grad():
    # line here fails if gpu does not support bfloat16
    pred_depth, confidence, output_dict = model.inference({'input': rgb})

tianyilim avatar Jun 20 '24 07:06 tianyilim

You can make a copy of the repo and add the proposed solution. If you insist the model must be on torch hub, you can upload your copy to torch hub and use it from there.

Or you can try to talk the owner into accepting the solution and adding it to the repo.

haojiang95 avatar Jun 24 '24 20:06 haojiang95

Hi, sorry for commenting on a closed issue. But if we are using the torch hub model, how do we modify it to use float?

# where to modify code below?
model = torch.hub.load('yvanyin/metric3d', 'metric3d_vit_small', pretrain=True)
model.cuda().eval()
with torch.no_grad():
    # line here fails if gpu does not support bfloat16
    pred_depth, confidence, output_dict = model.inference({'input': rgb})

I'm struggling in the same problem. Any solutions?

xmeng525 avatar Aug 21 '24 20:08 xmeng525

for the record, I found access to a computer with a newer GPU. Sorry :(

tianyilim avatar Aug 21 '24 20:08 tianyilim

Hi,

I think i handled the problem. Just for inference with torchhub pretrained model, if you change the following code lines from bfloat16 to float16, it may work. Works for me!

GPU: Jetson Nano 4 GB (ARM) CUDA 10.2 Pytorch 1.12

Here is the instructions:

cd ~/.cache/torch/hub/yvanyin_metric3d_main (or if you use pyenv find respective folder in your environment) nano mono/model/decode_heads/RAFTDepthNormalDPTDecoder5.py (or vscode whether you like!)

Change following lines from this:

def interpolate_float32(x, size=None, scale_factor=None, mode='nearest', align_corners=None):
    with torch.autocast(device_type='cuda', dtype=torch.bfloat16, enabled=False):
        return F.interpolate(x.float(), size=size, scale_factor=scale_factor, mode=mode, align_corners=align_corners)
    new_size = (4 * flow.shape[2], 4 * flow.shape[3])
    with torch.autocast(device_type='cuda', dtype=torch.bfloat16, enabled=False):
        return  F.interpolate(flow, size=new_size, mode=mode, align_corners=True)

To this:

  def interpolate_float32(x, size=None, scale_factor=None, mode='nearest', align_corners=None):
    with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=False):
        return F.interpolate(x.float(), size=size, scale_factor=scale_factor, mode=mode, align_corners=align_corners)
    new_size = (4 * flow.shape[2], 4 * flow.shape[3])
    with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=False):
        return  F.interpolate(flow, size=new_size, mode=mode, align_corners=True)

Which is basically changing bfloat16 to float16. It may work for training too but i didn't try that.

Good luck!

Mio-Atse avatar Oct 24 '24 19:10 Mio-Atse

你好

我想我处理了这个问题。仅用于使用 torchhub 预训练模型进行推理,如果您将以下代码行从 bfloat16 更改为 float16,它可能会起作用。对我有用!

GPU:Jetson Nano 4 GB (ARM) CUDA 10.2 Pytorch 1.12

以下是说明:

cd ~/.cache/torch/hub/yvanyin_metric3d_main(或者,如果您使用 pyenv,请在您的环境中查找相应的文件夹) (或者 VSCode 如果你愿意!nano mono/model/decode_heads/RAFTDepthNormalDPTDecoder5.py

从这里更改以下行:

def interpolate_float32(x, size=None, scale_factor=None, mode='nearest', align_corners=None):
    with torch.autocast(device_type='cuda', dtype=torch.bfloat16, enabled=False):
        return F.interpolate(x.float(), size=size, scale_factor=scale_factor, mode=mode, align_corners=align_corners)
    new_size = (4 * flow.shape[2], 4 * flow.shape[3])
    with torch.autocast(device_type='cuda', dtype=torch.bfloat16, enabled=False):
        return  F.interpolate(flow, size=new_size, mode=mode, align_corners=True)

对此:

  def interpolate_float32(x, size=None, scale_factor=None, mode='nearest', align_corners=None):
    with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=False):
        return F.interpolate(x.float(), size=size, scale_factor=scale_factor, mode=mode, align_corners=align_corners)
    new_size = (4 * flow.shape[2], 4 * flow.shape[3])
    with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=False):
        return  F.interpolate(flow, size=new_size, mode=mode, align_corners=True)

这基本上是改为 .它可能也适用于训练,但我没有尝试过。bfloat16 ``float16

祝你好运!

Hello, I would like to know how is the inference speed on your jetson nano? Also, did you use tensorRT acceleration and do quantization and pruning?

Linengyao avatar Nov 28 '24 02:11 Linengyao

你好 我想我处理了这个问题。仅用于使用 torchhub 预训练模型进行推理,如果您将以下代码行从 bfloat16 更改为 float16,它可能会起作用。对我有用! GPU:Jetson Nano 4 GB (ARM) CUDA 10.2 Pytorch 1.12 以下是说明: cd ~/.cache/torch/hub/yvanyin_metric3d_main(或者,如果您使用 pyenv,请在您的环境中查找相应的文件夹) (或者 VSCode 如果你愿意!nano mono/model/decode_heads/RAFTDepthNormalDPTDecoder5.py 从这里更改以下行:

def interpolate_float32(x, size=None, scale_factor=None, mode='nearest', align_corners=None):
    with torch.autocast(device_type='cuda', dtype=torch.bfloat16, enabled=False):
        return F.interpolate(x.float(), size=size, scale_factor=scale_factor, mode=mode, align_corners=align_corners)
    new_size = (4 * flow.shape[2], 4 * flow.shape[3])
    with torch.autocast(device_type='cuda', dtype=torch.bfloat16, enabled=False):
        return  F.interpolate(flow, size=new_size, mode=mode, align_corners=True)

对此:

  def interpolate_float32(x, size=None, scale_factor=None, mode='nearest', align_corners=None):
    with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=False):
        return F.interpolate(x.float(), size=size, scale_factor=scale_factor, mode=mode, align_corners=align_corners)
    new_size = (4 * flow.shape[2], 4 * flow.shape[3])
    with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=False):
        return  F.interpolate(flow, size=new_size, mode=mode, align_corners=True)

这基本上是改为 .它可能也适用于训练,但我没有尝试过。bfloat16float16 `` 祝你好运!

Hello, I would like to know how is the inference speed on your jetson nano? Also, did you use tensorRT acceleration and do quantization and pruning?

Hi,

I don't have access Jetson Nano rn. So i can't answer this question. Remind me in next week. You can send a mail to check :)

I am not using methods that you discuss. But without exact measurements, i can say for torchhub pretrained models:

480x360 small model 1 second per frame 480x360 large model 7-10 second per frame

e-mail: [email protected]

Mio-Atse avatar Nov 28 '24 07:11 Mio-Atse