Metric3D
Metric3D copied to clipboard
Supporting old GPUs?
Hello, thanks again for the great work! Your model uses torch.bfloat16
which is only supported by the newer GPUs.
https://github.com/YvanYin/Metric3D/blob/7b5440dcbc17ef5e09805169a5f0b2d6bfe59161/mono/model/decode_heads/RAFTDepthNormalDPTDecoder5.py#L218-L229
May I ask you to kindly support older ones by adding an option to use torch.float32
instead? It could be as simple as
dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float32
, and use dtype
in autocast
Is there a workaround for this error ? Do you know if replacing bfloat16 to torch.float enable the weights to load correctly ?
Is there a workaround for this error ? Do you know if replacing bfloat16 to torch.float enable the weights to load correctly ?
It works on my side to the extent that the outputs look reasonable. I don't know any other workaround, and I'd be happy to know if there are better options.
Is there a workaround for this error ? Do you know if replacing bfloat16 to torch.float enable the weights to load correctly ?
I think the weights will be loaded correctly because float32 should be compatible with bfloat16.
Hi, sorry for commenting on a closed issue. But if we are using the torch hub model, how do we modify it to use float?
# where to modify code below?
model = torch.hub.load('yvanyin/metric3d', 'metric3d_vit_small', pretrain=True)
model.cuda().eval()
with torch.no_grad():
# line here fails if gpu does not support bfloat16
pred_depth, confidence, output_dict = model.inference({'input': rgb})
You can make a copy of the repo and add the proposed solution. If you insist the model must be on torch hub, you can upload your copy to torch hub and use it from there.
Or you can try to talk the owner into accepting the solution and adding it to the repo.
Hi, sorry for commenting on a closed issue. But if we are using the torch hub model, how do we modify it to use float?
# where to modify code below? model = torch.hub.load('yvanyin/metric3d', 'metric3d_vit_small', pretrain=True) model.cuda().eval() with torch.no_grad(): # line here fails if gpu does not support bfloat16 pred_depth, confidence, output_dict = model.inference({'input': rgb})
I'm struggling in the same problem. Any solutions?
for the record, I found access to a computer with a newer GPU. Sorry :(
Hi,
I think i handled the problem. Just for inference with torchhub pretrained model, if you change the following code lines from bfloat16 to float16, it may work. Works for me!
GPU: Jetson Nano 4 GB (ARM) CUDA 10.2 Pytorch 1.12
Here is the instructions:
cd ~/.cache/torch/hub/yvanyin_metric3d_main
(or if you use pyenv find respective folder in your environment)
nano mono/model/decode_heads/RAFTDepthNormalDPTDecoder5.py
(or vscode whether you like!)
Change following lines from this:
def interpolate_float32(x, size=None, scale_factor=None, mode='nearest', align_corners=None):
with torch.autocast(device_type='cuda', dtype=torch.bfloat16, enabled=False):
return F.interpolate(x.float(), size=size, scale_factor=scale_factor, mode=mode, align_corners=align_corners)
new_size = (4 * flow.shape[2], 4 * flow.shape[3])
with torch.autocast(device_type='cuda', dtype=torch.bfloat16, enabled=False):
return F.interpolate(flow, size=new_size, mode=mode, align_corners=True)
To this:
def interpolate_float32(x, size=None, scale_factor=None, mode='nearest', align_corners=None):
with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=False):
return F.interpolate(x.float(), size=size, scale_factor=scale_factor, mode=mode, align_corners=align_corners)
new_size = (4 * flow.shape[2], 4 * flow.shape[3])
with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=False):
return F.interpolate(flow, size=new_size, mode=mode, align_corners=True)
Which is basically changing bfloat16
to float16
. It may work for training too but i didn't try that.
Good luck!
你好
我想我处理了这个问题。仅用于使用 torchhub 预训练模型进行推理,如果您将以下代码行从 bfloat16 更改为 float16,它可能会起作用。对我有用!
GPU:Jetson Nano 4 GB (ARM) CUDA 10.2 Pytorch 1.12
以下是说明:
cd ~/.cache/torch/hub/yvanyin_metric3d_main
(或者,如果您使用 pyenv,请在您的环境中查找相应的文件夹) (或者 VSCode 如果你愿意!nano mono/model/decode_heads/RAFTDepthNormalDPTDecoder5.py
从这里更改以下行:
def interpolate_float32(x, size=None, scale_factor=None, mode='nearest', align_corners=None): with torch.autocast(device_type='cuda', dtype=torch.bfloat16, enabled=False): return F.interpolate(x.float(), size=size, scale_factor=scale_factor, mode=mode, align_corners=align_corners) new_size = (4 * flow.shape[2], 4 * flow.shape[3]) with torch.autocast(device_type='cuda', dtype=torch.bfloat16, enabled=False): return F.interpolate(flow, size=new_size, mode=mode, align_corners=True)
对此:
def interpolate_float32(x, size=None, scale_factor=None, mode='nearest', align_corners=None): with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=False): return F.interpolate(x.float(), size=size, scale_factor=scale_factor, mode=mode, align_corners=align_corners) new_size = (4 * flow.shape[2], 4 * flow.shape[3]) with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=False): return F.interpolate(flow, size=new_size, mode=mode, align_corners=True)
这基本上是改为 .它可能也适用于训练,但我没有尝试过。
bfloat16 ``float16
祝你好运!
Hello, I would like to know how is the inference speed on your jetson nano? Also, did you use tensorRT acceleration and do quantization and pruning?
你好 我想我处理了这个问题。仅用于使用 torchhub 预训练模型进行推理,如果您将以下代码行从 bfloat16 更改为 float16,它可能会起作用。对我有用! GPU:Jetson Nano 4 GB (ARM) CUDA 10.2 Pytorch 1.12 以下是说明:
cd ~/.cache/torch/hub/yvanyin_metric3d_main
(或者,如果您使用 pyenv,请在您的环境中查找相应的文件夹) (或者 VSCode 如果你愿意!nano mono/model/decode_heads/RAFTDepthNormalDPTDecoder5.py
从这里更改以下行:def interpolate_float32(x, size=None, scale_factor=None, mode='nearest', align_corners=None): with torch.autocast(device_type='cuda', dtype=torch.bfloat16, enabled=False): return F.interpolate(x.float(), size=size, scale_factor=scale_factor, mode=mode, align_corners=align_corners) new_size = (4 * flow.shape[2], 4 * flow.shape[3]) with torch.autocast(device_type='cuda', dtype=torch.bfloat16, enabled=False): return F.interpolate(flow, size=new_size, mode=mode, align_corners=True)
对此:
def interpolate_float32(x, size=None, scale_factor=None, mode='nearest', align_corners=None): with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=False): return F.interpolate(x.float(), size=size, scale_factor=scale_factor, mode=mode, align_corners=align_corners) new_size = (4 * flow.shape[2], 4 * flow.shape[3]) with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=False): return F.interpolate(flow, size=new_size, mode=mode, align_corners=True)
这基本上是改为 .它可能也适用于训练,但我没有尝试过。
bfloat16
float16 `` 祝你好运!Hello, I would like to know how is the inference speed on your jetson nano? Also, did you use tensorRT acceleration and do quantization and pruning?
Hi,
I don't have access Jetson Nano rn. So i can't answer this question. Remind me in next week. You can send a mail to check :)
I am not using methods that you discuss. But without exact measurements, i can say for torchhub pretrained models:
480x360 small model 1 second per frame 480x360 large model 7-10 second per frame
e-mail: [email protected]