gramesh-amd issues

Repositories
Issues
Comments

Results 3 issues of


                                            gramesh-amd

[BUG] Gradient accumulation causing training loss differences in Deepspeed vs FSDP

**Describe the bug** I am trying to pretrain an [Olmo ](https://github.com/allenai/OLMo)1B model on 8 MI 250 GPUs with Docker image: rocm/pytorch:latest (ROCm 6.1). I'm using a small subset of Dolma...

bug

training

How to do gradient accumulation?

I couldnt find much info on how to do gradient accumulation when training with gpus?

converted mlperf gpt3 ckpt starts with a worse loss

Hello, We converted the paxml checkpoint and resumed training with following config: ``` base_config: "base.yml" tokenizer_path: "/dockerx/vocab/c4_en_301_5Mexp2_spm.model" dataset_type: "tfds" dataset_path: "/ckpts/c4_mlperf_dataset" dataset_name: "en:3.0.4" eval_dataset_name: "en:3.0.5" split: "train2" tokenize_eval_data: False eval_data_column:...