Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Results 124 Megatron-DeepSpeed issues
Sort by recently updated
recently updated
newest added

This adds an option to launch preprocessing from an HF dataset (loaded from an arrow file for now as that's the use-case on JZ) rather than just jsonlines.

this is some of the code we have written to debug tied embed synchronization issues, so pushing it here in case it will be needed down the road. Most likely...

Just noticed in the logs that `--skip-train-iteration-range` reports only a single range, when there should be 2: I currently have in the config: ``` --skip-train-iteration-range 13251-14000 16651-19500 ``` But the...

Closes #124. I found that few checks mentioned in `sanity-checks.md` were already being done in `parse_args` like `NHIDDEN % NHEADS == 0`, `GLOBAL_BATCH_SIZE % (MICRO_BATCH_SIZE * DP_SIZE) == 0` so...

Stella pointed out to how they do consistency calculations/checks with NeoX: https://github.com/EleutherAI/gpt-neox/blob/main/megatron/neox_arguments/arguments.py It'd be good for someone to study what they did over the base Megatron-LM and replicate anything that...

Good First Issue
Good Second Issue

I am trying to run tests on the codebase. I am using a docker image on an AWS p3.2xlarge (Tesla V100) ``` docker pull nvcr.io/nvidia/pytorch:21.10-py3 ``` running `python -m pip...

@DanielHesslow has opened a PR https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/212. This allows us to evaluate Megatron-Deepspeed models using the EAI harness directly in this repo, without needing to convert models into HF format. The...

Good First Issue

This PR resolves #149 by implementing a `ModelInspector` class similar to transformer's [`DebugUnderflowOverflow`](https://huggingface.co/transformers/debugging.html). - Using pytorch fwd/bwd hooks log multiple things about each model's submodule and its args. 1. fwd/bwd...

This PR fixes #203 by using the `args` global variable holder to save and access model parameter counts during gigaflops counting. This is sensible given that the number of model...

- Related to: #209 Basically re-opening the PR as it seems to pass locally but not CI.