InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

KeyError: 'architectures'

Open CachCheng opened this issue 1 year ago • 2 comments

Checklist

  • [ ] 1. I have searched related issues but cannot get the expected help.
  • [ ] 2. The bug has not been fixed in the latest version.
  • [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

vision_config: None llm_config: None [rank0]: Traceback (most recent call last): [rank0]: File "/meta/cash/llm/cvllm/internvl_chat/internvl/train/internvl_chat_finetune.py", line 847, in [rank0]: main() [rank0]: File "/meta/cash/llm/cvllm/internvl_chat/internvl/train/internvl_chat_finetune.py", line 832, in main [rank0]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank0]: File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 2052, in train [rank0]: return inner_training_loop( [rank0]: File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 2467, in _inner_training_loop [rank0]: self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval) [rank0]: File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 2918, in _maybe_log_save_evaluate [rank0]: self._save_checkpoint(model, trial, metrics=metrics) [rank0]: File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 3008, in _save_checkpoint [rank0]: self.save_model(output_dir, _internal_call=True) [rank0]: File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 3610, in save_model [rank0]: self._save(output_dir, state_dict=state_dict) [rank0]: File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 3727, in _save [rank0]: self.model.save_pretrained( [rank0]: File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2610, in save_pretrained [rank0]: misplaced_generation_parameters = model_to_save.config._get_non_default_generation_parameters() [rank0]: File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/configuration_utils.py", line 1030, in _get_non_default_generation_parameters [rank0]: default_config = self.class() [rank0]: File "/meta/cash/llm/cvllm/internvl_chat/internvl/model/internvl_chat/configuration_internvl_chat.py", line 54, in init [rank0]: if llm_config['architectures'][0] == 'LlamaForCausalLM': [rank0]: KeyError: 'architectures'

Reproduction

GPUS=4 sh shell/internvl2.0/2nd_finetune/internvl2_2b_internlm2_1_8b_dynamic_res_2nd_finetune_lora.sh

Environment

OpenGVLab/InternVL2-2B

Error traceback

vision_config:  None
llm_config:  None
[rank0]: Traceback (most recent call last):
[rank0]:   File "/meta/cash/llm/cvllm/internvl_chat/internvl/train/internvl_chat_finetune.py", line 847, in <module>
[rank0]:     main()
[rank0]:   File "/meta/cash/llm/cvllm/internvl_chat/internvl/train/internvl_chat_finetune.py", line 832, in main
[rank0]:     train_result = trainer.train(resume_from_checkpoint=checkpoint)
[rank0]:   File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 2052, in train
[rank0]:     return inner_training_loop(
[rank0]:   File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 2467, in _inner_training_loop
[rank0]:     self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
[rank0]:   File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 2918, in _maybe_log_save_evaluate
[rank0]:     self._save_checkpoint(model, trial, metrics=metrics)
[rank0]:   File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 3008, in _save_checkpoint
[rank0]:     self.save_model(output_dir, _internal_call=True)
[rank0]:   File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 3610, in save_model
[rank0]:     self._save(output_dir, state_dict=state_dict)
[rank0]:   File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 3727, in _save
[rank0]:     self.model.save_pretrained(
[rank0]:   File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2610, in save_pretrained
[rank0]:     misplaced_generation_parameters = model_to_save.config._get_non_default_generation_parameters()
[rank0]:   File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/configuration_utils.py", line 1030, in _get_non_default_generation_parameters
[rank0]:     default_config = self.__class__()
[rank0]:   File "/meta/cash/llm/cvllm/internvl_chat/internvl/model/internvl_chat/configuration_internvl_chat.py", line 54, in __init__
[rank0]:     if llm_config['architectures'][0] == 'LlamaForCausalLM':
[rank0]: KeyError: 'architectures'

CachCheng avatar Sep 30 '24 07:09 CachCheng

Same here. I think there are some compatibility issues with the new version of libraries. I suspected it was due to the multi-GPUs setting, but when I tried from 8 GPUs down to 1 GPU, the error still occurred sometimes.

Even if I used the specific library version in the requirements file, I would get Error code -9 Sigkill in the multi-GPUs setting.

Can the authors update the library versions/ verify the scripts please? Thank you very much! @czczup @whai362 @opengvlab-admin

JiazhengChai avatar Oct 04 '24 23:10 JiazhengChai

我在使用lmdeploy对internvl2-4B进行awq量化的时候也遇到了类似的错误 image

Mrgengli avatar Oct 05 '24 10:10 Mrgengli

Downgrading to transformers==4.44.2 solved this issue for me.

https://github.com/modelscope/ms-swift/issues/2180#issuecomment-2390606972

seongminp avatar Oct 08 '24 08:10 seongminp

@seongminp thank you very much ,my problems is solved.

Mrgengli avatar Oct 08 '24 08:10 Mrgengli

Downgrading to transformers==4.44.2 solved this issue for me.

modelscope/ms-swift#2180 (comment)

Thank you for your assistance, the issue has been successfully resolved.

Inkyl avatar Oct 13 '24 09:10 Inkyl

Downgrading to transformers==4.44.2 solved this issue for me.

modelscope/ms-swift#2180 (comment)

Thank you very much, this method is effective!

Mir-Sailor avatar Dec 07 '24 12:12 Mir-Sailor

Hi, I recently fixed this issue and should be able to use the new version now.

czczup avatar Dec 08 '24 14:12 czczup

Hi, I recently fixed this issue and should be able to use the new version now.

it is still wrong in today. my transformers is 4.47.0

duoyw avatar Dec 16 '24 07:12 duoyw

Hi, I recently fixed this issue and should be able to use the new version now.

it is still wrong in today. my transformers is 4.47.0

Hello, are you using OpenGVLab/InternVL2-xxxB to load the model, or downloading the model locally? I updated all the model codes on HuggingFace 10 days ago and added the following code to fix this bug:

image

czczup avatar Dec 16 '24 07:12 czczup