gramesh-amd

Results 3 issues of gramesh-amd

**Describe the bug** I am trying to pretrain an [Olmo ](https://github.com/allenai/OLMo)1B model on 8 MI 250 GPUs with Docker image: rocm/pytorch:latest (ROCm 6.1). I'm using a small subset of Dolma...

bug
training

I couldnt find much info on how to do gradient accumulation when training with gpus?

Hello, We converted the paxml checkpoint and resumed training with following config: ``` base_config: "base.yml" tokenizer_path: "/dockerx/vocab/c4_en_301_5Mexp2_spm.model" dataset_type: "tfds" dataset_path: "/ckpts/c4_mlperf_dataset" dataset_name: "en:3.0.4" eval_dataset_name: "en:3.0.5" split: "train2" tokenize_eval_data: False eval_data_column:...