Bailey Kuehl comments

Results 42 comments of


                                            Bailey Kuehl

Running run_dataloader.py is very slow

If you only need the data order to train, you do not need to explicitly get the data order or use this script. You can just add the `--stop_at` parameter...

Running run_dataloader.py is very slow

You should be able to use this script: https://github.com/allenai/OLMo/blob/main/scripts/inspect_train_data.py This script will give you the global indices, which will let you reproduce the training sequence without actually running the data_loader.

Are raw Dolmino Mix 1124 compositions (50B, 100B and 300B) publicly available?

Hi, thanks for the inquiry! This is something we did consider, but unfortunately, we don't have the raw data sectioned out into these specific compositions available currently. We have the...

Generating config.json in olmo_data/oe_eval_tasks for new evaluation tasks.

Hi, thanks for the question! To generate the request files, you can use our eval repo: https://github.com/allenai/oe-eval. You'll need to add zipped data in the same jsonl format (following the...

Generating config.json in olmo_data/oe_eval_tasks for new evaluation tasks.

There are a few things I can recommend, depending on what you're trying to do. If you're simply wanting to evaluate some models, then I'd recommend the oe-eval repository, and...

Generating config.json in olmo_data/oe_eval_tasks for new evaluation tasks.

Hi, thanks for sharing that additional information! The OLMES suite is actually suited for models in Hugging Face hub, as well as [local (or remote) paths](https://github.com/allenai/olmes/blob/47b90730b3f4abb4cb6c59b6e2ecf4122d9eb11b/oe_eval/configs/models.py#L6), so this should work...