WizardLM icon indicating copy to clipboard operation
WizardLM copied to clipboard

Error loading fine-tuned wizard model

Open gabriead opened this issue 1 year ago • 4 comments

Hi guys, I have used your fine-tuning script on a custom data set to fine tune the wizardLm model. The training works without problems, the loss decreases and all the relevant model files are stored in the specified output directory. However I checked the size of the pytorch_model.bin and it is only 623 KB large so I guess the error must be within saving the model. I used the model "TheBloke/wizardLM-7B-HF" from HuggingFace as base model for finetuning. If I try to load the model for inference using the inference_wizardlm.py script giving it the output_directory specified during training it turns up this error:

'Traceback (most recent call last): File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/dolly-gpu-a100/code/Users/adrian.gabriel/WizardLM/src/inference_wizardlm.py", line 132, in fire.Fire(main) File "/anaconda/envs/llamax/lib/python3.10/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/anaconda/envs/llamax/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/anaconda/envs/llamax/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/dolly-gpu-a100/code/Users/adrian.gabriel/WizardLM/src/inference_wizardlm.py", line 121, in main _output = evaluate(tokenizer, model, instruction) File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/dolly-gpu-a100/code/Users/adrian.gabriel/WizardLM/src/inference_wizardlm.py", line 57, in evaluate generation_output = model.generate( File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/generation/utils.py", line 1437, in generate return self.greedy_search( File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/generation/utils.py", line 2248, in greedy_search outputs = self( File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/anaconda/envs/llamax/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward outputs = self.model( File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 530, in forward inputs_embeds = self.embed_tokens(input_ids) File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/anaconda/envs/llamax/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 158, in forward return F.embedding( File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/functional.py", line 2199, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: 'weight' must be 2-D ' Where could the problem be?

gabriead avatar May 04 '23 09:05 gabriead

Have you solved this problem? I am also facing the same issue.

dongZheX avatar Jun 29 '23 09:06 dongZheX

+1

Luowaterbi avatar Jun 30 '23 15:06 Luowaterbi

I have fixed it. This error may be caused by uncomplete checkpoints (check the size of checkpoint dir).

---- Replied Message ---- | From | Xianzhen @.> | | Date | 06/30/2023 23:56 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [nlpxucan/WizardLM] Error loading fine-tuned wizard model (Issue #25) |

+1

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

dongZheX avatar Jun 30 '23 16:06 dongZheX

I have fixed it. This error may be caused by uncomplete checkpoints (check the size of checkpoint dir). ---- Replied Message ---- | From | Xianzhen @.> | | Date | 06/30/2023 23:56 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [nlpxucan/WizardLM] Error loading fine-tuned wizard model (Issue #25) | +1 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Thank you so much!

Luowaterbi avatar Jun 30 '23 16:06 Luowaterbi

I have fixed it. This error may be caused by uncomplete checkpoints (check the size of checkpoint dir). ---- Replied Message ---- | From | Xianzhen @.> | | Date | 06/30/2023 23:56 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [nlpxucan/WizardLM] Error loading fine-tuned wizard model (Issue #25) | +1 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

I met the same problem. My pytorch_model.bin is also only 623 KB, could you teach me how to fix it? Thank you!

XiangTodayEatsWhat avatar Jul 11 '23 13:07 XiangTodayEatsWhat