OLMo
OLMo copied to clipboard
Modeling, training, eval, and inference code for OLMo
### 🐛 Describe the bug I'm trying to load a OLMO-1B checkpoint into huggingface in order to utilize the HF inference and trainer scripts. However, I'm having trouble loading the...
### 🐛 Describe the bug Following instructions here https://github.com/allenai/OLMo used to work some days ago. I tried it yesterday and today again and it throws the following warning/error `You should...
### 🐛 Describe the bug I think there are two problems with multi-epoch training: - Training finishes if setting e.g. `duration: 2e12T` & 1 epoch < 2e12 tokens. It currently...
### ❓ The question This is a cross-post from https://github.com/allenai/OLMo-Eval/issues/31 for visibility. I ran [olmo_eval](https://github.com/allenai/OLMo-Eval) with [allenai/OLMo-1B](https://huggingface.co/allenai/OLMo-1B) on the paloma dataset and I noticed two issues: 1. The evaluation metric...
I may be missing some nuances with the checkpointing but can we do sth akin to this PR to avoid trying to load the trainer state when the file is...
fix spelling FDSP->FSDP
 There is an order of magnitude difference between the losses between the two setups. @dirkgr @epwalsh can you sanity check the OLMo grad_clipping code for FSDP no_shard/DDP? On 3...
### ❓ The question Right now I see in the [pyproject.toml](https://github.com/allenai/OLMo/blob/main/pyproject.toml), OLMo requires torch
Scripts for power law fitting
### ❓ The question I use the default config `configs/official/OLMo-1B.yaml` and remove the wandb config. Then training the model at 8*A800. And run the cmd `torchrun --nproc_per_node=8 scripts/train.py configs/official/OLMo-1B.yaml` for...