skrub icon indicating copy to clipboard operation
skrub copied to clipboard

Fixing the `TOKENIZERS_PARALLELISM` warning in the TextEncoder

Open Vincent-Maladiere opened this issue 9 months ago • 3 comments

This warning shows up sometimes in the TextEncoder but I can't reproduce it on demand. I need to make a reproducer.

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

Vincent-Maladiere avatar Mar 03 '25 14:03 Vincent-Maladiere

interesting. I would try to investigate if there is really a problem or if tokenizers is worried about fork when loky is actually doing fork+exec which is safe (same problem we had with polars)...

jeromedockes avatar Mar 03 '25 14:03 jeromedockes

Thanks, I will investigate. Do you still have the reference for the fix/issue for Polars?

Vincent-Maladiere avatar Mar 03 '25 14:03 Vincent-Maladiere

https://github.com/pola-rs/polars/issues/20255

jeromedockes avatar Mar 03 '25 14:03 jeromedockes