OLMo icon indicating copy to clipboard operation
OLMo copied to clipboard

Modeling, training, eval, and inference code for OLMo

Results 245 OLMo issues
Sort by recently updated
recently updated
newest added

Trying torch scripting and applying the rotations in the complex plane instead of R²

It is suspicious that we had two slightly different models (one with biases, one without), that both spiked at exactly the same moment. This suggests there might be a data...

project/model
project/data

What happens now === Our runs produce "checkpoint directories". You might have seen them. Checkpoint directories contain a bunch of debris from a run, including between 0 and n actual...

The problem is that on LUMI, FSDP doesn't overlap computation and communication like it should. Evidence comes from this profiler trace: ![Image](https://github.com/allenai/LLM/assets/920638/9d5d4437-adc7-485d-97c3-4cf71643808f) It may be noteworthy that the NCCL GPU...

- Does not yet support checkpointing - `configs/olmo-small-ablation-lumi-deepspeed.yaml` is the same as `configs/olmo-small-ablation-lumi.yaml` except for `deepspeed: true` & `init_device: cpu` - `scripts/lumi/olmo-small-ablation-on-lumi-test.sh` is the same as `scripts/lumi/olmo-small-ablation-on-lumi-test-deepspeed.sh` except for `export...

This is the `kebab` config, a smaller version of the `dirk` config. Differences from `dirk`: * untied weights * weight decay on everything * adjusted `mlp_hidden_size` so we come out...

Updating the Llama config to use Llama block and RoPE lower precision, to match the behavior of bf16-autocast Llama more closely.

## Update - 11/3/23 Mitch is a big fan of Z-loss. Currently they're running Z-loss, no weight tying, LR=1e-3, wd=0.1, QK norm. So with Z-loss (and potentially QK norm) it...