dolma icon indicating copy to clipboard operation
dolma copied to clipboard

Data and tools for generating and inspecting OLMo pre-training data.

Results 22 dolma issues
Sort by recently updated
recently updated
newest added

I have been looking into https://github.com/allenai/dolma/blob/main/src/bloom_filter.rs Specifically how it was thread-safe ``` pub fn contains_hashes(&self, hashes: &Vec) -> bool { for hash in hashes { let hash = *hash as...

I would really love a proper contributing.md document styleguide and precommit. _Originally posted by @chris-ha458 in https://github.com/allenai/dolma/issues/23#issuecomment-1685029747_