dolma
dolma copied to clipboard
Data and tools for generating and inspecting OLMo pre-training data.
Results
22
dolma issues
Sort by
recently updated
recently updated
newest added
I have been looking into https://github.com/allenai/dolma/blob/main/src/bloom_filter.rs Specifically how it was thread-safe ``` pub fn contains_hashes(&self, hashes: &Vec) -> bool { for hash in hashes { let hash = *hash as...
I would really love a proper contributing.md document styleguide and precommit. _Originally posted by @chris-ha458 in https://github.com/allenai/dolma/issues/23#issuecomment-1685029747_