OLMo issues

Benchmarking different RoPE impls

1

Trying torch scripting and applying the rotations in the complex plane instead of R²

Look at data right where the spike happens

2

It is suspicious that we had two slightly different models (one with biases, one without), that both spiked at exactly the same moment. This suggests there might be a data...

dirkgr

project/model

project/data

Better Checkpoint Management

6

What happens now === Our runs produce "checkpoint directories". You might have seen them. Checkpoint directories contain a bunch of debris from a run, including between 0 and n actual...

dirkgr

The problem is that on LUMI, FSDP doesn't overlap computation and communication like it should. Evidence comes from this profiler trace: ![Image](https://github.com/allenai/LLM/assets/920638/9d5d4437-adc7-485d-97c3-4cf71643808f) It may be noteworthy that the NCCL GPU...

2015aroras

DeepSpeed

1

- Does not yet support checkpointing - `configs/olmo-small-ablation-lumi-deepspeed.yaml` is the same as `configs/olmo-small-ablation-lumi.yaml` except for `deepspeed: true` & `init_device: cpu` - `scripts/lumi/olmo-small-ablation-on-lumi-test.sh` is the same as `scripts/lumi/olmo-small-ablation-on-lumi-test-deepspeed.sh` except for `export...

Muennighoff

Kebab7

This is the `kebab` config, a smaller version of the `dirk` config. Differences from `dirk`: * untied weights * weight decay on everything * adjusted `mlp_hidden_size` so we come out...

dirkgr

Update Llama config to use Llama block and RoPE lower precision

6

Updating the Llama config to use Llama block and RoPE lower precision, to match the behavior of bf16-autocast Llama more closely.

2015aroras

OLMo
OLMo copied to clipboard

Metadata

Benchmarking different RoPE impls

Look at data right where the spike happens

Better Checkpoint Management

FSDP Overlap Investigation

DeepSpeed

Kebab7

Update Llama config to use Llama block and RoPE lower precision

Revisit Z-loss

Mitchish kempner

Llama config with a default layer norm instead of RMS for performance

← Metadata

Owner

Metadata

OLMo OLMo copied to clipboard

Metadata

← Metadata

Owner

Metadata

OLMo
OLMo copied to clipboard