Megatron-DeepSpeed issues

Results 124 Megatron-DeepSpeed issues

Sort by recently updated

Create AWS image with our tools and codebase

to make it easy for new users to use our setup

ibeltagy

enhancement

Add checks to confirm that the checkpoint conversion script works perfectly correct

We now have a script that convert megatron-deepspeed checkpoints to HF-transformers checkpoints. Project is [here](https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/main/tools/convert_checkpoint) and the script is [here](https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/tools/convert_checkpoint/deepspeed_to_transformers.py). However, the script doesn't have unit tests that confirm that...

ibeltagy

Good First Issue

[WIP] [fp32 checkpoint] very early experiments with extracting fp32 params

I just started to look at how to adapt [zero_to_fp32.py](https://github.com/microsoft/DeepSpeed/blob/51a2e916b730cf676c66532b19d973a603377cb0/deepspeed/utils/zero_to_fp32.py) to extract fp32 weights from optimizer states. I will park this for now since it was said today fp16 weights...

stas00

MLFlow

[testing] data size / dynamic downloads - test speed and repo bloat

Let's discuss which data is used in the test suite. And after the discussion turn into guidelines for test writers. Here is a very rough start: * We want to...

stas00

Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard

Metadata

Create AWS image with our tools and codebase

Add checks to confirm that the checkpoint conversion script works perfectly correct

[WIP] [fp32 checkpoint] very early experiments with extracting fp32 params

extend preprocess_data_dist to handle jsonl files

Double counts in parameter count

[requirements] check we test agains the correct deepspeed branch

wip [CI] dealing with concurrency

WIP: distributed terashuf

Set up a basic MLflow setup

[testing] data size / dynamic downloads - test speed and repo bloat

← Metadata

Owner

Metadata

Megatron-DeepSpeed Megatron-DeepSpeed copied to clipboard

Metadata

← Metadata

Owner

Metadata

Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard