Megatron-DeepSpeed issues

Results 124 Megatron-DeepSpeed issues

Sort by recently updated

Errors in generation (Bloom) when changing options sampling/use_cache

I'm running the inference script `bloom-ds-inference.py` by invoking: `deepspeed --num_gpus 1 ~/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py --name bigscience/bloom-1b3 --benchmark`, but I change the generation arguments to `generate_kwargs = dict(max_new_tokens=num_tokens, do_sample=False, use_cache=False)` (adding use_cache option)....

thies1006

Add Bitfit

This PR adds compatibility for BitFit. I'd like to try BitFit + MTF to retain Multilinguality. Empirical evidence from this [paper](https://arxiv.org/pdf/2109.00904.pdf): Note that adapters also add parameters to the model...

Muennighoff

how to convert huggingface model to megatron-deepspeed?

as title said.

yayaQAQ

Add generation server scripts using HF accelerate and DS-inference

Currently, the scripts are working correctly. However, there is some memory leak. In https://github.com/huggingface/accelerate/issues/614 @sgugger says that its not in accelerate which is probably true. My hunch is a related...

mayank31398

Add option to normalize loss per target

Muennighoff

[Tensorboard] Log text prediction in evaluation

A very useful tool in order to understand model performance beyond obtaining loss: Actually show what are the predictions. It'd be very useful to be able to "see" the output...

thomasw21

Good First Issue

BigScience Eval Harness

Changes: - Adds compatibility with the lm-evaluation-harness fork of BigScience [here](https://github.com/bigscience-workshop/lm-evaluation-harness). - Reduces the memory load by offloading to CPU earlier, thanks to @thomasw21! Notes: - Almost the same as...

Muennighoff

Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard

Metadata

Errors in generation (Bloom) when changing options sampling/use_cache

Add Bitfit

how to convert huggingface model to megatron-deepspeed?

Add generation server scripts using HF accelerate and DS-inference

Add option to normalize loss per target

[Tensorboard] Log text prediction in evaluation

BigScience Eval Harness

[checkpoints] replace bf16 with fp32 checkpoint weights

DeepSpeed inference support for int8 parameters on BLOOM?

Enable loading ckpt for t0 finetuning

← Metadata

Owner

Metadata

Megatron-DeepSpeed Megatron-DeepSpeed copied to clipboard

Metadata

← Metadata

Owner

Metadata

Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard