Bailey Kuehl

Results 42 comments of Bailey Kuehl

If you only need the data order to train, you do not need to explicitly get the data order or use this script. You can just add the `--stop_at` parameter...

You should be able to use this script: https://github.com/allenai/OLMo/blob/main/scripts/inspect_train_data.py This script will give you the global indices, which will let you reproduce the training sequence without actually running the data_loader.

Hi, thanks for the inquiry! This is something we did consider, but unfortunately, we don't have the raw data sectioned out into these specific compositions available currently. We have the...

Hi, thanks for the question! To generate the request files, you can use our eval repo: https://github.com/allenai/oe-eval. You'll need to add zipped data in the same jsonl format (following the...

There are a few things I can recommend, depending on what you're trying to do. If you're simply wanting to evaluate some models, then I'd recommend the oe-eval repository, and...

Hi, thanks for sharing that additional information! The OLMES suite is actually suited for models in Hugging Face hub, as well as [local (or remote) paths](https://github.com/allenai/olmes/blob/47b90730b3f4abb4cb6c59b6e2ecf4122d9eb11b/oe_eval/configs/models.py#L6), so this should work...

No, we did not. We do not do generative GSM8K evaluations in-loop. The reason it appears as a multiple choice task is that we are evaluating perplexity over the human-written...

Hi, for using custom models, my message from last week still applies: > The OLMES suite is actually suited for models in Hugging Face hub, as well as [local (or...

Hi there, you're getting this error because you need to first convert your model into one of those formats, as I mentioned in my message above: > This will work...

Hi there! Thanks for the kind words and the inquiry! Our pretraining data was intentionally scoped down to include only English -- we used FastText classifiers to remove non-English text,...