Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Results 124 Megatron-DeepSpeed issues
Sort by recently updated
recently updated
newest added

I'm running the inference script `bloom-ds-inference.py` by invoking: `deepspeed --num_gpus 1 ~/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py --name bigscience/bloom-1b3 --benchmark`, but I change the generation arguments to `generate_kwargs = dict(max_new_tokens=num_tokens, do_sample=False, use_cache=False)` (adding use_cache option)....

This PR adds compatibility for BitFit. I'd like to try BitFit + MTF to retain Multilinguality. Empirical evidence from this [paper](https://arxiv.org/pdf/2109.00904.pdf): Note that adapters also add parameters to the model...

Currently, the scripts are working correctly. However, there is some memory leak. In https://github.com/huggingface/accelerate/issues/614 @sgugger says that its not in accelerate which is probably true. My hunch is a related...

A very useful tool in order to understand model performance beyond obtaining loss: Actually show what are the predictions. It'd be very useful to be able to "see" the output...

Good First Issue

Changes: - Adds compatibility with the lm-evaluation-harness fork of BigScience [here](https://github.com/bigscience-workshop/lm-evaluation-harness). - Reduces the memory load by offloading to CPU earlier, thanks to @thomasw21! Notes: - Almost the same as...

as requested by @thomasw21 this is a little hack script to to replace half-precision weights with fp32 weights in the existing HF transformers checkpoint seeded from the universal checkpoint, see...

Recently, HuggingFace `transformers` has a new feature on [int8 quantization](https://github.com/huggingface/transformers/pull/17901) for all HuggingFace models. This feature could reduce the size of the large models by up to 2 without a...