NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

MPT to NeMo conversion

Open ethanhe42 opened this issue 5 months ago • 3 comments

This script converts mpt-7b to mcore (transformer engine).

two things need to be aware:

  1. layernorm instead of Low precision layernorm is used
  2. no bias layernorm is not supported by transformer engine as of now

to use

python scripts/checkpoint_converters/convert_mpt_7b_hf_to_nemo.py

text generation:

python examples/nlp/language_modeling/megatron_gpt_eval.py             gpt_model_file=mpt_7b_mcore.nemo             inference.greedy=True                 trainer.devices=1             trainer.num_nodes=1             tensor_model_parallel_size=1             pipeline_model_parallel_size=1             prompts="[query: how much protein should a female eat]"

example outputs:

[{'sentences': ['query: how much protein should a female eat?\nI am a female, 5\'4" and weigh about 130 lbs. I am a vegetarian and I am trying to lose weight. I'], 'tokens': [['<|endoftext|>', 'query', ':', 'Ġhow', 'Ġmuch', 'Ġprotein', 'Ġshould', 'Ġa', 'Ġfemale', 'Ġeat', '?', 'Ċ', 'I', 'Ġam', 'Ġa', 'Ġfemale', ',', 'Ġ5', "'", '4', '"', 'Ġand', 'Ġweigh', 'Ġabout', 'Ġ130', 'Ġlbs', '.', 'ĠI', 'Ġam', 'Ġa', 'Ġvegetarian', 'Ġand', 'ĠI', 'Ġam', 'Ġtrying', 'Ġto', 'Ġlose', 'Ġweight', '.', 'ĠI']], 'logprob': None, 'full_logprob': None, 'token_ids': [[0, 7267, 27, 849, 1199, 2601, 943, 247, 5343, 6008, 32, 187, 42, 717, 247, 5343, 13, 608, 8, 21, 3, 285, 14357, 670, 11084, 38818, 15, 309, 717, 247, 39203, 285, 309, 717, 2820, 281, 7168, 2801, 15, 309]], 'offsets': [[0, 0, 5, 6, 10, 15, 23, 30, 32, 39, 43, 44, 45, 46, 49, 51, 58, 59, 61, 62, 63, 64, 68, 74, 80, 84, 88, 89, 91, 94, 96, 107, 111, 113, 116, 123, 126, 131, 138, 139]]}]

This PR depends on https://github.com/NVIDIA/Megatron-LM/pull/668

ethanhe42 avatar Jan 12 '24 00:01 ethanhe42

@yaoyu-33

ethanhe42 avatar Jan 18 '24 01:01 ethanhe42

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions[bot] avatar Feb 09 '24 01:02 github-actions[bot]

This PR was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Feb 16 '24 01:02 github-actions[bot]

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions[bot] avatar Mar 03 '24 01:03 github-actions[bot]

This PR was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Mar 10 '24 01:03 github-actions[bot]