Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard
Issue to gather Fixes + New features to send upstream
Please edit the OP to add whatever fixes we applied to the core and which need to be propagated upstream into:
- https://github.com/microsoft/Megatron-DeepSpeed
- https://github.com/NVIDIA/Megatron-LM
we want to do that to make it easier to sync upstream changes back to this repo.
Changes to send upstream:
Bug fixes:
- https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/9189c4eb646b08493b5d47d00d0e30280110ecde fix bug when restarting with no eval in round 1
- https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/0125aaa7cd83b7c078e373145d1dd9a28be6b50c Fix merge functions to take in account
doc_ids - https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/9e75429deca5332398c231dccdf430d97f7ab2bd Fix Tensorboard logging with correct rank condition as detailed in this issue on Megatron-LM repo. Relevant part of the PR: https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/9e75429deca5332398c231dccdf430d97f7ab2bd#diff-e2b248f8c422a601bcb0b7d93f96c1dff070f2694737e2b69f1def64ab9c1844R589
- https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/56c2983b167e9c8bd7b38b953a90096822ae76b7 - fixes issues with
--rampup-batch-sizehelp entry - https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/c680954e0b13232abd5b72711f2032bea1ad65c9 Fix document offset when merging documents, bug introduces in https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/0125aaa7cd83b7c078e373145d1dd9a28be6b50c. Thanks @adammoody !
- https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/648ee17fc1b32ade877d7cadec79a9079b9826ac - check whether python3-config is available and clearly assert if it doesn't when failing to build helpers
New functionality:
- https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/18201ce4cb79a42e9cfe434a9a693a305af679be , https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/e8fcbae2eef401be915443971771bb8efe0d35f6, https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/c680954e0b13232abd5b72711f2032bea1ad65c9 add
tools/merge_preprocessed_data.pyto support merging datasets - might be easier to just copy the new script. - https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/7b9988146881f6eee33f69c28a92ae03e2678e42 - new faster preprocessing script for when one has many cpu cores.
- https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/5069622a9948c93d144261ffc4c9284908d901de - new preprocessing script that uses HuggingFace Datasets as source
- Cirriculum learning: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/132 + https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/133