OLMo 2
https://huggingface.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc https://arxiv.org/abs/2501.00656
Version 2 of OLMo released by Ai2.
Comes in 7B and 13B Base, Instruct, and additional SFT and DPO models.
First, we find that OLMo 2 7B and 13B are the best fully-open models to-date, often outperforming open weight models of equivalent size. Not only do we observe a dramatic improvement in performance across all tasks compared to our earlier OLMo 0424 model but, notably, OLMo 2 7B outperforms LLama-3.1 8B and OLMo 2 13B outperforms Qwen 2.5 7B despite its lower total training FLOPs. The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance (see figure above).
Hi there, just wanted to say thanks for taking on this PR (I know this is a lot of work)! The OLMo models are awesome, and I'd be great to have OLMo 2 in LitGPT.
Hi there, just wanted to say thanks for taking on this PR (I know this is a lot of work)! The OLMo models are awesome, and I'd be great to have OLMo 2 in LitGPT.
Thanks mate!
Currently on vacation, will resume working on this PR once I'm back.
Performed some fixes today, now test_model passes for Olmo2.
seems like almost all tests are failing on:
FAILED tests/test_model.py::test_sdpa_choice_kv_cache[SmolLM2-1.7B-Instruct] - RuntimeError: Deterministic behavior was enabled with either `torch.use_deterministic_algorithms(True)` or `at::Context::setDeterministicAlgorithms(true)`, but this operation is not deterministic because it uses CuBLAS and you have CUDA >= 10.2. To enable deterministic behavior in this case, you must set an environment variable before running your PyTorch application: CUBLAS_WORKSPACE_CONFIG=:4096:8 or CUBLAS_WORKSPACE_CONFIG=:16:8. For more information, go to https://docs.nvidia.com/cuda/cublas/index.html#results-reproducibility
@lantiga I have addressed the issues and left some follow-up comments.
@Borda Changes should be ready to merge
@Borda Changes should be ready to merge
@t-vi mind have a look as codeowner, pls ^^