bff icon indicating copy to clipboard operation
bff copied to clipboard

Big OpenLM/DCLM <-> AI2 PR # 1

Open revbucket opened this issue 2 months ago • 0 comments

Lots of changes here (may be considered a refactor more than a PR, but will still require some heavy code reviews and discussion about which changes to keep/fold in).

Summary of changes:

  • Added commands for bff and sysreq to get sense of how much memory a given BFF run will require
  • Changed some defaults of arguments:
    • min-ngram/max-ngram now default to [20,20]
    • by default the bloom filter file is not saved (this can be specified)
    • annotations have been merged into a single argument
  • progress bar present (but a no-progress-bar arg is also present)
  • some more abstraction/functions to break things up and eventually not repeat code when I push the S3 PR
  • added BOTH level removal type (some discussion about what this does in the RemoveType enum)
  • Added some printouts with BFF sparsity, removal rates, time
  • misc performance-y things, like parallel iteration in some places

revbucket avatar Apr 17 '24 18:04 revbucket