LMDrive icon indicating copy to clipboard operation
LMDrive copied to clipboard

RuntimeError: Error(s) in loading state_dict for ResNet

Open dszpr opened this issue 1 year ago • 7 comments

Hi! Much appreciated for the excellent work! When doing instruction finetuning, I encountered an error:

WARNING:root:Pytorch pre-release version 1.14.0a0+410ce96 - assuming intent to test it /usr/local/lib/python3.8/dist-packages/diffusers/models/cross_attention.py:30: FutureWarning: Importing from cross_attention is deprecated. Please import from diffusers.models.attention_processor instead. deprecate( | distributed init (rank 0, world 1): env:// [1704792300.373239] [7771d2eff014:2391 :f] vfs_fuse.c:281 UCX ERROR inotify_add_watch(/tmp) failed: No space left on device Traceback (most recent call last): File "train.py", line 103, in main() File "train.py", line 94, in main model = task.build_model(cfg) File "/workspace/code/LMDrive/LAVIS/lavis/tasks/drive.py", line 35, in build_model return model_cls.from_config(model_config) File "/workspace/code/LMDrive/LAVIS/lavis/models/drive_models/drive.py", line 575, in from_config model = cls( File "/workspace/code/LMDrive/LAVIS/lavis/models/drive_models/drive.py", line 87, in init self.visual_encoder.load_state_dict(pretrain_weights, strict=True) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1918, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for ResNet: Missing key(s) in state_dict:...

In your original config file "notice_llava15_visual_encoder_r50_seq40.yaml", preception_model: memfuser_baseline_e1d3_return_feature It would cause "RuntimeError: Unknown model (memfuser_baseline_e1d3_return_feature)" So I changed 'memfuser_baseline_e1d3_return_feature' into 'resnet50', and the above 'RuntimeError: Error(s) in loading state_dict for ResNet:' occurred. Do you know how to fix this? I noticed that there is another error:"vfs_fuse.c:281 UCX ERROR inotify_add_watch(/tmp) failed: No space left on device". Does it have something to do with my failure? Many thanks and looking forward to your reply.

dszpr avatar Jan 09 '24 09:01 dszpr

Hi! It looks like you have not installed the vision encoder correctly. The model name should be memfuser_baseline_e1d3_return_feature, instead of ResNet.

"vfs_fuse.c:281 UCX ERROR inotify_add_watch(/tmp) failed: No space left on device"

Maybe you don't have enough disk space?

You can try the following steps:

  1. pip uninstall timm
  2. cd vision_encoder/
  3. python setup.py develop

deepcs233 avatar Jan 09 '24 16:01 deepcs233

Hi! After uninstall timm and 'python setup.py develop' the vision_encoder, I just can't find timm module: ModuleNotFoundError: No module named 'timm' I also tried to use 'pip install -e .' to install vision_encoder, and the log printed 'Successfully installed timm'. However, I just can't find timm module by 'pip list'. So it seems the module wasn't installed properly anyway. I conduct the project in DOCKER instead of CONDA environment, does this has something to do with the failure of installing the vision_encoder?

dszpr avatar Jan 10 '24 00:01 dszpr

Hi! I just created a blank conda env and installed the package. It's ok to run the following script:

import timm
timm.create_model('memfuser_baseline_e1d3_return_feature')

Maybe you need to create a conda env in your docker?

deepcs233 avatar Jan 10 '24 07:01 deepcs233

Also, I have updated the Setup section in readme and some files in the repo. Please use the latest version.

deepcs233 avatar Jan 10 '24 09:01 deepcs233

I run the following scripts and it has an Unknown Model error. Is it because timm's model name changed. import timm timm.create_model('memfuser_baseline_e1d3_return_feature')

kongdehong avatar Apr 25 '24 08:04 kongdehong

pip install timm==0.4.13 @dszpr @deepcs233

Dagoli avatar Jun 03 '24 11:06 Dagoli

pip install timm==0.4.13 @dszpr @deepcs233

Dagoli avatar Jun 03 '24 11:06 Dagoli