tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

Fix unsoundness in `tokenizers::utils::parallelism`

Open albertsgarde opened this issue 1 year ago • 2 comments

Fixes #1491 by turning static mut USED_PARALLELISM: bool into a Mutex, i.e. static USED_PARALLELISM: Mutex<bool>. This allows the value to be accessed and modified safely and avoids unsoundness. There is a slight performance penalty, but given that the variable is only modified in the context of other parallel work, it is almost certainly negligible.

This PR fails several tests, but only the tests main also fails.

albertsgarde avatar Apr 12 '24 10:04 albertsgarde

Hey sorry for the delay I'll try to review for next week!

ArthurZucker avatar May 23 '24 09:05 ArthurZucker

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker I think #1532 is a superior solution, so this PR should probably just be closed

albertsgarde avatar May 28 '24 09:05 albertsgarde

Thanks a lot for this PR and fixing the unsoundness (unsafe).

This PR seems even slightly better using Atomic instead (which are lock-free).https://github.com/huggingface/tokenizers/pull/1532

Narsil avatar Jun 06 '24 10:06 Narsil