tokenizers
tokenizers copied to clipboard
Fix unsoundness in `tokenizers::utils::parallelism`
Fixes #1491 by turning static mut USED_PARALLELISM: bool into a Mutex, i.e. static USED_PARALLELISM: Mutex<bool>. This allows the value to be accessed and modified safely and avoids unsoundness. There is a slight performance penalty, but given that the variable is only modified in the context of other parallel work, it is almost certainly negligible.
This PR fails several tests, but only the tests main also fails.
Hey sorry for the delay I'll try to review for next week!
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
@ArthurZucker I think #1532 is a superior solution, so this PR should probably just be closed
Thanks a lot for this PR and fixing the unsoundness (unsafe).
This PR seems even slightly better using Atomic instead (which are lock-free).https://github.com/huggingface/tokenizers/pull/1532