datatrove
datatrove copied to clipboard
MinhashDedup should use the language parameter
https://github.com/huggingface/datatrove/blob/1e27cc8819465d5246d89cd929423b76eb0bc5dd/src/datatrove/pipeline/dedup/minhash.py#L196