ZoeDepth icon indicating copy to clipboard operation
ZoeDepth copied to clipboard

Error's in loading state_dict for ZoeDepthNK

Open minuenergy opened this issue 2 years ago • 13 comments

I want to finetune head 2 times which is trained in Carla dataset(my custom data), and train nyu-kitti (train_mix.py)

I use this command [python train_mix.py -m zoedepth_nk --pretrained_resource="local::Carla/ZoeDepthv1_13-Jul_04-44-e6e03405a1f8_best.pt"] but I met this error `` image

minuenergy avatar Jul 20 '23 05:07 minuenergy

the same error

TopAImaster avatar Jul 26 '23 01:07 TopAImaster

nuistAImaster I find byself. this error because of mismatch between mono-head modeL(ZoeDepth_N, ZoeDepth_K) and multi-head model(ZoeDepth_NK)

minuenergy avatar Jul 26 '23 01:07 minuenergy

@minuenergy If you have trained the model, how can you use the trained parameters to load the model?

TopAImaster avatar Jul 26 '23 02:07 TopAImaster

Hi @minuenergy I'm trying to use the train_mono.py script to finetune on the kitti dataset just to test.

python train_mono.py -m zoedepth -d kitti --pretrained_resource="url::https://github.com/isl-org/ZoeDepth/releases/download/v1.0/ZoeD_M12_K.pt"

But I keep getting size mismatch errors when load_state_dict.

Do you have any advice?

Thanks in advance.

MACILLAS avatar Jul 26 '23 23:07 MACILLAS

I think I figure it out...

In ZoeDepth/zoedepth/models/zoedepth/confg_zoedepth.json under model configurations...

you will need to change the n_attractors from [16, 8, 4, 1] to [32, 16, 8, 2]

I hope this is the right way to do it... The model loads now and trains.

MACILLAS avatar Jul 27 '23 14:07 MACILLAS

it means the optimal parameter is different from what is reported in the paper?

kwea123 avatar Jul 29 '23 05:07 kwea123

Umm.. I didn't use _k model for training i cannot provide solution for you.. However training kitti for scratch can be the alternative way.

minuenergy avatar Jul 29 '23 08:07 minuenergy

@minuenergy If you have trained the model, how can you use the trained parameters to load the model?

You can use arguments --pretrained_resource "" to load your local weights

minuenergy avatar Jul 29 '23 08:07 minuenergy

Hi @minuenergy I'm a little confused about this. The authors released the pretrained models (N, K and NK), if we want to reproduce the author's results we need to start with a model that has not been finetuned right? Do you know where I can find that? Or is it as simple as leaving the --pretrained_resource blank?

Thank you all very much for your help!

MACILLAS avatar Jul 29 '23 13:07 MACILLAS

@MACILLAS @minuenergy @nuistAImaster Hello, may I ask what is the computer configuration for your training? I use a 3060 graphics card to train a dataset of hundreds of images, which will result in errors. Can you answer that? Thank you.

zhangjd1029 avatar Jul 29 '23 13:07 zhangjd1029

Hi Zhang I think you might be on the same boat as me. I've got a single A5000 24GB I had to disable distributed mode and I turned my batch size down to 1... I originally tried to run it on an TitanV with 12GB of VRAM I got the NYUv2 to run (again batch size 1) but it always failed during online validation (cuda out of memory). With the A5000 my gpu was running at 85% utilization ~20GB of VRAM. So I'd say that is the bare minimum to train?

There was another thread here where the original authors explained they used 4x40GB GPUs for finetuning and it takes ~2hr for NYUv2 [#11]. And someone else also wrote that performance is affected if you lower your batch size... [#43 ] I wish I can get my hands on a lambda cloud cluster rn tbh but everything is sold out since chatgpt dropped lol

MACILLAS avatar Jul 29 '23 13:07 MACILLAS

@MACILLAS Thank you for your reply. What you said is correct. I just found a better computer for training and it can be trained successfully. Its configuration is 4090. The A5000 benchmark seems to be 3070. By the way, is the trained model saved in shortcut?

zhangjd1029 avatar Aug 01 '23 08:08 zhangjd1029

pip install timm==0.6.7

songchangshun avatar Aug 08 '24 13:08 songchangshun