Luca Soldaini
Luca Soldaini
I would recommend running QuickUMLS on [WSL](https://learn.microsoft.com/en-us/windows/wsl/install)
Hey @epwalsh, added the 1B config and set the correct EOS token on both 1B and 7B. Didn't touch any data paths, lmk how you'd to handle it.
yes @IanMagnusson I'm documenting the 1.5 creation process and will PR soon ❤️
Collected a first version of the corpus. Steps I followed are [here](https://github.com/allenai/LLM/blob/soldni/data/scripts/lucas/s2ag/README.md), but a summary is as follows: Data info: - Corpus is located at `s3://ai2-s2-research-public/lucas/s2orc_oa_2022_01_03` - It is comprised...
Same issue here.
I've stumbled upon this issue recently, too.
uh, that is pretty confusing! could you post a sample of the data in your yaml file?
hi @mihara-bot! which could you give me more info on the system you are on? you shouldn't need to install rust under x86-64 to use dolma; pypi package should come...
Issues should have been fixed with #66.
This is nice; I will bump in the next version @peterbjorgensen! In the meantime, I recently added support for specifying rules using jq syntax (not the default, but can be...