bff icon indicating copy to clipboard operation
bff copied to clipboard

Backport improvements from the Dolma repo

Open dirkgr opened this issue 9 months ago • 1 comments

@chris-ha458 has made some great improvements to BFF in the https://github.com/allenai/dolma repo. We should back-port those changes here, especially the ones that have to do with correctness (like the ones involving the choice of hash functions).

Chris' PRs are here:

  • https://github.com/allenai/dolma/pull/23
  • https://github.com/allenai/dolma/pull/24
  • https://github.com/allenai/dolma/pull/31
  • https://github.com/allenai/dolma/pull/35
  • https://github.com/allenai/dolma/pull/39

They won't apply 1:1, because things have changed in the Dolma repo, but at least the important things should carry over.

dirkgr avatar Sep 20 '23 19:09 dirkgr