ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: inference.py

Open JingxinLee opened this issue 1 year ago β€’ 8 comments

πŸ› Describe the bug

python inference.py --model_path ./actor_checkpoint_prompts.pt --pretrain bloom-560m --model bloom


 size mismatch for transformer.ln_f.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for transformer.ln_f.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for lm_head.weight: copying a param with shape torch.Size([50257, 768]) from checkpoint, the shape in current model is torch.Size([250880, 1024]).

Environment

No response

JingxinLee avatar Mar 13 '23 09:03 JingxinLee

Thanks for your feedback, please update your code to our newest pr, this problem has been solved.

ht-zhou avatar Mar 14 '23 01:03 ht-zhou

Thanks for your feedback, please update your code to our newest pr, this problem has been solved.

I am sure I used the latest code. Problem still exists.

JingxinLee avatar Mar 14 '23 02:03 JingxinLee

There may be some problems with the old ckpt. I suggest you to train a new one for few epochs to test.

ht-zhou avatar Mar 14 '23 02:03 ht-zhou

There may be some problems with the old ckpt. I suggest you to train a new one for few epochs to test.

How to realize stage1 of chatGPT training? Can you explain it in detail

1a2cjitenfei avatar Mar 14 '23 03:03 1a2cjitenfei

? Can you explain it in detail

You can use a human-writing dataset to fine tune a pretrained language model like Bloom/Lamma.

ht-zhou avatar Mar 14 '23 06:03 ht-zhou

Bot detected the issue body's language is not English, translate it automatically. πŸ‘―πŸ‘­πŸ»πŸ§‘β€πŸ€β€πŸ§‘πŸ‘«πŸ§‘πŸΏβ€πŸ€β€πŸ§‘πŸ»πŸ‘©πŸΎβ€πŸ€β€πŸ‘¨πŸΏπŸ‘¬πŸΏ


? Can you explain it in detail

You can use a human-writing dataset to fine tune a pretrained language model like Bloom/Lamma.

Issues-translate-bot avatar Mar 14 '23 06:03 Issues-translate-bot

There may be some problems with the old ckpt. I suggest you to train a new one for few epochs to test.

I trained new one. And this problem still exist.

JingxinLee avatar Mar 14 '23 08:03 JingxinLee

? Can you explain it in detail

You can use a human-writing dataset to fine tune a pretrained language model like Bloom/Lamma.

How to operate it? Using the hugging face? However, the GPU video memory that can be supported by using the hugging face is ColossalAI 1/10. This problem will come back. It is recommended to add stage1

1a2cjitenfei avatar Mar 14 '23 11:03 1a2cjitenfei

There may be some problems with the old ckpt. I suggest you to train a new one for few epochs to test.

I trained new one. And this problem still exist. You can refer to our test_ci in https://github.com/hpcaitech/ColossalAI/blob/main/applications/ChatGPT/examples/test_ci.sh to try. We have already completed the ci process of this application, so it may not have this problem because the code has passed this ci.

ht-zhou avatar Mar 20 '23 02:03 ht-zhou

? Can you explain it in detail

You can use a human-writing dataset to fine tune a pretrained language model like Bloom/Lamma.

How to operate it? Using the hugging face? However, the GPU video memory that can be supported by using the hugging face is ColossalAI 1/10. This problem will come back. It is recommended to add stage1

Ok, we will support stage1 in our example soon.

ht-zhou avatar Mar 20 '23 02:03 ht-zhou

@JingxinLee Hello, sorry to disturb have you found a solution to this? I'm facing the same issue for days now.. (+ missing keys from the PPO save state_dict)

sarrahbbh avatar Jun 05 '23 13:06 sarrahbbh