Megatron-DeepSpeed issues

Results 124 Megatron-DeepSpeed issues

Sort by recently updated

Load Bloom Optimizer State (i.e. Bloom 1B1)

Hi, I want to continue training of the Bloom model. To start simple, I want to load the 1.1B model into the BigScience Megatron-DeepSpeed library. I tried to run pretrain_gpt.py...

philippmtk

grad norm increase strangely

When I run this [https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/smaller_models/tr11f-6B3-ml-continuation.slurm](https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/smaller_models/tr11f-6B3-ml-continuation.slurm) script to continue training the model ,I got a strange grad norm and huge loss after auto reduced loss-scale of overflow .Just like I am...

misska1

dependencies

Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard

Metadata

Load Bloom Optimizer State (i.e. Bloom 1B1)

grad norm increase strangely

Encoding checkpoint reshaping guide

Slower inference results for BLOOM fp16 on identical hardware

Installing Apex on Windows

Add multiple evaluation compat

About convert deepspeed to deepspeed checkpoint

Finetuning BLOOM

Can we also train BLOOM model using tensor using tensor-Parallelism and efficient fused CUDA kernels

Bump black from 21.4b0 to 24.3.0

← Metadata

Owner

Metadata

Megatron-DeepSpeed Megatron-DeepSpeed copied to clipboard

Metadata

← Metadata

Owner

Metadata

Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard