Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard
deepspeed_to_megatron several issues
-
A recent commit removed
tools/convert_checkpoint/deepspeed_checkpoint.py
but there is still an attempt to import it intools/convert_checkpoint/deepspeed_to_megatron.py
. The other scripts in the folder appear to be ok. I guess the import on the line 7 should be changed fromfrom .deepspeed_checkpoint import ARGS_KEY, DeepSpeedCheckpoint
tofrom deepspeed.checkpoint.deepspeed_checkpoint import ARGS_KEY, DeepSpeedCheckpoint
? -
https://github.com/microsoft/Megatron-DeepSpeed/issues/91 applies here as well.
-
The function
_renest_sd
defined on line 90, splits the keys, but I encounter aValueError: too many values to unpack (expected 2)
as one of the keys is namedword_embeddings.norm.weight
, containing two dots. There are probably more such layer names. I'm attempting to convert my own model bf16_zero (pp=1, tp=1) checkpoints to megatron and/or hf format. Entirely possible this is all my own wrongdoing. If so, please advise, how to successfully convert.
same question.
same question.
same question.