Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

Issue to gather Fixes + New features to send upstream

Open stas00 opened this issue 4 years ago • 1 comments

Please edit the OP to add whatever fixes we applied to the core and which need to be propagated upstream into:

  1. https://github.com/microsoft/Megatron-DeepSpeed
  2. https://github.com/NVIDIA/Megatron-LM

we want to do that to make it easier to sync upstream changes back to this repo.

Changes to send upstream:

Bug fixes:

  • https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/9189c4eb646b08493b5d47d00d0e30280110ecde fix bug when restarting with no eval in round 1
  • https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/0125aaa7cd83b7c078e373145d1dd9a28be6b50c Fix merge functions to take in account doc_ids
  • https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/9e75429deca5332398c231dccdf430d97f7ab2bd Fix Tensorboard logging with correct rank condition as detailed in this issue on Megatron-LM repo. Relevant part of the PR: https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/9e75429deca5332398c231dccdf430d97f7ab2bd#diff-e2b248f8c422a601bcb0b7d93f96c1dff070f2694737e2b69f1def64ab9c1844R589
  • https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/56c2983b167e9c8bd7b38b953a90096822ae76b7 - fixes issues with --rampup-batch-size help entry
  • https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/c680954e0b13232abd5b72711f2032bea1ad65c9 Fix document offset when merging documents, bug introduces in https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/0125aaa7cd83b7c078e373145d1dd9a28be6b50c. Thanks @adammoody !
  • https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/648ee17fc1b32ade877d7cadec79a9079b9826ac - check whether python3-config is available and clearly assert if it doesn't when failing to build helpers

New functionality:

  • https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/18201ce4cb79a42e9cfe434a9a693a305af679be , https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/e8fcbae2eef401be915443971771bb8efe0d35f6, https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/c680954e0b13232abd5b72711f2032bea1ad65c9 add tools/merge_preprocessed_data.py to support merging datasets - might be easier to just copy the new script.
  • https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/7b9988146881f6eee33f69c28a92ae03e2678e42 - new faster preprocessing script for when one has many cpu cores.
  • https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/5069622a9948c93d144261ffc4c9284908d901de - new preprocessing script that uses HuggingFace Datasets as source
  • Cirriculum learning: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/132 + https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/133

stas00 avatar Jul 23 '21 19:07 stas00

9189c4e and 9e75429 have been fixed. Will take a look at the rest later. Thank you!

shoeybi avatar Aug 17 '21 22:08 shoeybi