litgpt OLMo 2

https://huggingface.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc https://arxiv.org/abs/2501.00656

Version 2 of OLMo released by Ai2.

Comes in 7B and 13B Base, Instruct, and additional SFT and DPO models.

First, we find that OLMo 2 7B and 13B are the best fully-open models to-date, often outperforming open weight models of equivalent size. Not only do we observe a dramatic improvement in performance across all tasks compared to our earlier OLMo 0424 model but, notably, OLMo 2 7B outperforms LLama-3.1 8B and OLMo 2 13B outperforms Qwen 2.5 7B despite its lower total training FLOPs. The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance (see figure above).

Jan 04 '25 05:01 ysjprojects

Hi there, just wanted to say thanks for taking on this PR (I know this is a lot of work)! The OLMo models are awesome, and I'd be great to have OLMo 2 in LitGPT.

Jan 08 '25 15:01 rasbt

Hi there, just wanted to say thanks for taking on this PR (I know this is a lot of work)! The OLMo models are awesome, and I'd be great to have OLMo 2 in LitGPT.

Thanks mate!

Currently on vacation, will resume working on this PR once I'm back.

Jan 12 '25 07:01 ysjprojects

Performed some fixes today, now test_model passes for Olmo2.

Feb 26 '25 22:02 ysjprojects

seems like almost all tests are failing on:

FAILED tests/test_model.py::test_sdpa_choice_kv_cache[SmolLM2-1.7B-Instruct] - RuntimeError: Deterministic behavior was enabled with either `torch.use_deterministic_algorithms(True)` or `at::Context::setDeterministicAlgorithms(true)`, but this operation is not deterministic because it uses CuBLAS and you have CUDA >= 10.2. To enable deterministic behavior in this case, you must set an environment variable before running your PyTorch application: CUBLAS_WORKSPACE_CONFIG=:4096:8 or CUBLAS_WORKSPACE_CONFIG=:16:8. For more information, go to https://docs.nvidia.com/cuda/cublas/index.html#results-reproducibility

Mar 20 '25 09:03 Borda

@lantiga I have addressed the issues and left some follow-up comments.

May 16 '25 06:05 ysjprojects

@Borda Changes should be ready to merge

Jun 03 '25 21:06 ysjprojects

@Borda Changes should be ready to merge

@t-vi mind have a look as codeowner, pls ^^

Jun 03 '25 22:06 Borda