pyt comments

Results 16 comments of

pyt

[BUG]"Unexpected key(s) in state_dict" while loading Llama-megatron checkpoint.

But I hit the same problem when train legacy model

[BUG]"Unexpected key(s) in state_dict" while loading Llama-megatron checkpoint.

@lmcafee-nvidia During conversion.

[BUG]"Unexpected key(s) in state_dict" while loading Llama-megatron checkpoint.

@lmcafee-nvidia Megatron conversion works. But during training, we hit the exact error as this post. So we change the conversion type to --saver mcore. But the conversion couldn't finish. We...

[BUG]"Unexpected key(s) in state_dict" while loading Llama-megatron checkpoint.

@lmcafee-nvidia Just another update, We also tried this two flag. None of the solution you provide works for us --use-legacy-models --ckpt-format torch It still hit the state_dict error: ``` [rank4]:...

[BUG]"Unexpected key(s) in state_dict" while loading Llama-megatron checkpoint.

> [@TeddLi](https://github.com/TeddLi) You should use spawn as the start method for torch multiprocessing, otherwise CUDA context cannot be properly set up. A simple way to fix it is just add...

[BUG]"Unexpected key(s) in state_dict" while loading Llama-megatron checkpoint.

> Let's keep the discussion of github for now. Did you consider making a reproducible example? If you setup a script based on a public checkpoint, I can try to...