Megatron-DeepSpeed issues

Results 124 Megatron-DeepSpeed issues

Sort by recently updated

Calling IndexedDatasetBuilder directly with a best_fit datatype fails

I don't know whether this is intended to work or not, but I found the following program: ``` from megatron.data.indexed_dataset import IndexedDatasetBuilder, best_fitting_dtype best_dtype = best_fitting_dtype(10_000) IndexedDatasetBuilder("testfile", dtype=best_dtype) ``` leads...

adammoody

Implement the ML Flow experiment tracker

**Motivation**. As @sashavor suggested, the carbon footprint working group needs an experiment tracker to properly follow all runs being done. An experiment tracker could also be more broadly interesting to...

slippylolo

🌍 Carbon

C4-mC4 pre processing

mC4 data is too large. For 13 selected language it's around 18TB of data. I excluded the english data since teven already processed it. Arabic, Swahili (Bantu), Chinese, Catalan, English,...

sbmaruf

DeBERTa-like attention mechanism

In this issue, we discuss how viable/interesting it might be to implement DeBERTa like attention mechanism: https://arxiv.org/abs/2006.03654 Things to take in account: - performance enhancements: Check with HF pretrained model...

thomasw21

enhancement

arch&scale

recovering from loss spikes strategies

After having a 3->8->3 spike in the loss value a few days ago, which luckily recovered after a few hours of training, we want to discuss possible ready to use...

stas00

Implement Gradient Noise Scale monitoring

Follow appendix A.1 https://arxiv.org/pdf/1812.06162.pdf to implement monitoring of gradient noise scale and add it to the tensorboard log.

ibeltagy

arch&scale

Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard

Metadata

Calling IndexedDatasetBuilder directly with a best_fit datatype fails

Implement the ML Flow experiment tracker

C4-mC4 pre processing

DeBERTa-like attention mechanism

recovering from loss spikes strategies

Implement Gradient Noise Scale monitoring

deepspeed_to_megatron several issues

Distill BLOOM - tentative 2

Enable rocm-support

About reshape deepspeed checkpoint

← Metadata

Owner

Metadata

Megatron-DeepSpeed Megatron-DeepSpeed copied to clipboard

Metadata

← Metadata

Owner

Metadata

Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard