bff
bff copied to clipboard
Backport improvements from the Dolma repo
@chris-ha458 has made some great improvements to BFF in the https://github.com/allenai/dolma repo. We should back-port those changes here, especially the ones that have to do with correctness (like the ones involving the choice of hash functions).
Chris' PRs are here:
- https://github.com/allenai/dolma/pull/23
- https://github.com/allenai/dolma/pull/24
- https://github.com/allenai/dolma/pull/31
- https://github.com/allenai/dolma/pull/35
- https://github.com/allenai/dolma/pull/39
They won't apply 1:1, because things have changed in the Dolma repo, but at least the important things should carry over.