NeMo
NeMo copied to clipboard
MPT to NeMo conversion
This script converts mpt-7b to mcore (transformer engine).
two things need to be aware:
- layernorm instead of Low precision layernorm is used
- no bias layernorm is not supported by transformer engine as of now
to use
python scripts/checkpoint_converters/convert_mpt_7b_hf_to_nemo.py
text generation:
python examples/nlp/language_modeling/megatron_gpt_eval.py gpt_model_file=mpt_7b_mcore.nemo inference.greedy=True trainer.devices=1 trainer.num_nodes=1 tensor_model_parallel_size=1 pipeline_model_parallel_size=1 prompts="[query: how much protein should a female eat]"
example outputs:
[{'sentences': ['query: how much protein should a female eat?\nI am a female, 5\'4" and weigh about 130 lbs. I am a vegetarian and I am trying to lose weight. I'], 'tokens': [['<|endoftext|>', 'query', ':', 'Ġhow', 'Ġmuch', 'Ġprotein', 'Ġshould', 'Ġa', 'Ġfemale', 'Ġeat', '?', 'Ċ', 'I', 'Ġam', 'Ġa', 'Ġfemale', ',', 'Ġ5', "'", '4', '"', 'Ġand', 'Ġweigh', 'Ġabout', 'Ġ130', 'Ġlbs', '.', 'ĠI', 'Ġam', 'Ġa', 'Ġvegetarian', 'Ġand', 'ĠI', 'Ġam', 'Ġtrying', 'Ġto', 'Ġlose', 'Ġweight', '.', 'ĠI']], 'logprob': None, 'full_logprob': None, 'token_ids': [[0, 7267, 27, 849, 1199, 2601, 943, 247, 5343, 6008, 32, 187, 42, 717, 247, 5343, 13, 608, 8, 21, 3, 285, 14357, 670, 11084, 38818, 15, 309, 717, 247, 39203, 285, 309, 717, 2820, 281, 7168, 2801, 15, 309]], 'offsets': [[0, 0, 5, 6, 10, 15, 23, 30, 32, 39, 43, 44, 45, 46, 49, 51, 58, 59, 61, 62, 63, 64, 68, 74, 80, 84, 88, 89, 91, 94, 96, 107, 111, 113, 116, 123, 126, 131, 138, 139]]}]
This PR depends on https://github.com/NVIDIA/Megatron-LM/pull/668
@yaoyu-33
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.
This PR was closed because it has been inactive for 7 days since being marked as stale.
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.
This PR was closed because it has been inactive for 7 days since being marked as stale.