Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

deepspeed_to_megatron several issues

Open MatejUlcar opened this issue 2 years ago • 4 comments

  1. A recent commit removed tools/convert_checkpoint/deepspeed_checkpoint.py but there is still an attempt to import it in tools/convert_checkpoint/deepspeed_to_megatron.py. The other scripts in the folder appear to be ok. I guess the import on the line 7 should be changed from from .deepspeed_checkpoint import ARGS_KEY, DeepSpeedCheckpoint to from deepspeed.checkpoint.deepspeed_checkpoint import ARGS_KEY, DeepSpeedCheckpoint?

  2. https://github.com/microsoft/Megatron-DeepSpeed/issues/91 applies here as well.

  3. The function _renest_sd defined on line 90, splits the keys, but I encounter a ValueError: too many values to unpack (expected 2) as one of the keys is named word_embeddings.norm.weight, containing two dots. There are probably more such layer names. I'm attempting to convert my own model bf16_zero (pp=1, tp=1) checkpoints to megatron and/or hf format. Entirely possible this is all my own wrongdoing. If so, please advise, how to successfully convert.

MatejUlcar avatar Oct 14 '22 15:10 MatejUlcar

same question.

huybery avatar Feb 12 '23 03:02 huybery

same question.

Rosenberg37 avatar Mar 02 '23 06:03 Rosenberg37

same question.

yzzzwd avatar Apr 06 '23 09:04 yzzzwd