cryptowooser

Results 2 issues of cryptowooser

Thanks for the awesome repo! I'm loving it so far. The current version of utils.py uses two functions that have been deprecated in more recent versions of the Pillow library,...

I'm trying to run [process_common_crawl_dump.py](https://github.com/huggingface/datatrove/blob/main/examples/process_common_crawl_dump.py) to dedupe an 80GB megawarc I have, and the jsonl loader is taking a long time to load the data. It appears to be single-threaded...