OLMo issues

Problem with HF loading from model checkpoint

3

### 🐛 Describe the bug I'm trying to load a OLMO-1B checkpoint into huggingface in order to utilize the HF inference and trainer scripts. However, I'm having trouble loading the...

ryanyxw

type/bug

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

1

### 🐛 Describe the bug Following instructions here https://github.com/allenai/OLMo used to work some days ago. I tried it yesterday and today again and it throws the following warning/error `You should...

mclanza

type/bug

Problems with multi-epoch training

### 🐛 Describe the bug I think there are two problems with multi-epoch training: - Training finishes if setting e.g. `duration: 2e12T` & 1 epoch < 2e12 tokens. It currently...

Muennighoff

type/bug

OLMo-1B's results seem very bad on olmo-eval

### ❓ The question This is a cross-post from https://github.com/allenai/OLMo-Eval/issues/31 for visibility. I ran [olmo_eval](https://github.com/allenai/OLMo-Eval) with [allenai/OLMo-1B](https://huggingface.co/allenai/OLMo-1B) on the paloma dataset and I noticed two issues: 1. The evaluation metric...

Ivan-Zhou

type/question

Optionally load trainer state

2

I may be missing some nuances with the checkpointing but can we do sth akin to this PR to avoid trying to load the trainer state when the file is...

Muennighoff

Update train.py

fix spelling FDSP->FSDP

MLgdg

[wip] comparing vanilla torch model and clipping with OLMo FSDP, no_shard and OLMo clipping

3

![image](https://github.com/allenai/OLMo/assets/7491256/a1a1557a-7882-47b3-8045-45ad83fc7063) There is an order of magnitude difference between the losses between the two setups. @dirkgr @epwalsh can you sanity check the OLMo grad_clipping code for FSDP no_shard/DDP? On 3...

ananyahjha93

PyTorch 2.3 Support for for OLMo

### ❓ The question Right now I see in the [pyproject.toml](https://github.com/allenai/OLMo/blob/main/pyproject.toml), OLMo requires torch

prakamya-mishra

type/question

Scaling laws pipeline

Scripts for power law fitting

AkshitaB

OLMoThreadError: generator thread data thread 0 failed

1

### ❓ The question I use the default config `configs/official/OLMo-1B.yaml` and remove the wandb config. Then training the model at 8*A800. And run the cmd `torchrun --nproc_per_node=8 scripts/train.py configs/official/OLMo-1B.yaml` for...

ybdesire

type/question

OLMo
OLMo copied to clipboard

Metadata

Problem with HF loading from model checkpoint

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Problems with multi-epoch training

OLMo-1B's results seem very bad on olmo-eval

Optionally load trainer state

Update train.py

[wip] comparing vanilla torch model and clipping with OLMo FSDP, no_shard and OLMo clipping

PyTorch 2.3 Support for for OLMo

Scaling laws pipeline

OLMoThreadError: generator thread data thread 0 failed

← Metadata

Owner

Metadata

OLMo OLMo copied to clipboard

Metadata

← Metadata

Owner

Metadata

OLMo
OLMo copied to clipboard