Mehran Maghoumi

Results 4 issues of Mehran Maghoumi

Besides the toy examples listed in the docs and tests, are there actual examples of this library available anywhere? I'm interested in using this library for a sequence labeling project,...

## Description This PR ensures that users can run the PEFT SDG tutorial using arbitrary API endpoints by exposing the URL that is used for synthetic data generation. ## Checklist...

**Describe the bug** When attempting to run fuzzy deduplication on a dataset that has no duplicates, the code errors out. **Steps/Code to reproduce bug** 1) Clone the repo 2) Run...

bug

I've been running some large-scale benchmarking with minhash deduplication on SLURM clusters, loosely following [this example](https://github.com/huggingface/datatrove/blob/main/examples/minhash_deduplication.py) The benchmarks consist of running stages 1 and 2 with the following configurations: *...