bff icon indicating copy to clipboard operation
bff copied to clipboard

Results 7 bff issues
Sort by recently updated
recently updated
newest added

Lots of changes here (may be considered a refactor more than a PR, but will still require some heavy code reviews and discussion about which changes to keep/fold in). Summary...

Added `bff_v0.py` which is a simple python script to: 1) download all .jsonl.gz's from a specified S3 directory 2) Run BFF on ^ 3) Upload the outputs back to S3...

Several changes to main.rs: 1. Added progress bar printouts vs printouts at each filename (tried to use similar formatting as in `wimbd`) 2. Added directory support for inputs (can pass...

Thanks for sharing the great codes!! They have been very useful for me! I'm new to Rust and bloom filter and I have one question regarding the deduplication scope in...

@chris-ha458 has made some great improvements to BFF in the https://github.com/allenai/dolma repo. We should back-port those changes here, especially the ones that have to do with correctness (like the ones...

One thing that might be worth documenting when we get a chance is that the "bff_duplicate_spans" that are created by the `--annotate-only` are byte spans rather than character spans as...

documentation

Hi @dirkgr! Here is a feature that would be very much desirable for decontamination, but I'm not sure how difficult it would be to implement into BFF: The essential part...