skrub
skrub copied to clipboard
Fixing the `TOKENIZERS_PARALLELISM` warning in the TextEncoder
This warning shows up sometimes in the TextEncoder but I can't reproduce it on demand. I need to make a reproducer.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
interesting. I would try to investigate if there is really a problem or if tokenizers is worried about fork when loky is actually doing fork+exec which is safe (same problem we had with polars)...
Thanks, I will investigate. Do you still have the reference for the fix/issue for Polars?
https://github.com/pola-rs/polars/issues/20255