xgrammar icon indicating copy to clipboard operation
xgrammar copied to clipboard

[Misc] Use persistent thread pool

Open DarkSharpness opened this issue 3 months ago • 1 comments

Previously, each function call of MultiThreadCompileGrammar created its own thread pool. This could be misleading, as the total number of active worker threads might significantly exceed the configured max_threads — potentially reaching up to $n \times \text{max-threads}$ for $n$ concurrent compilation tasks.

This PR changes the implementation to use a shared global thread pool across all compilation tasks in one compiler, ensuring that the number of worker threads stays within the specified limit. For different grammar compilers, they still have their own thread pool.

Note: This change may introduce performance regressions in scenarios where the old behavior implicitly allowed over-subscription of threads, as thread usage is now strictly bounded.

DarkSharpness avatar Sep 19 '25 08:09 DarkSharpness

Updated cc @Ubospica @Seven-Streams . The rate limit policy should be refined later to achieve a balance between fairness(FIFO) and shortest-first (greedy-execution). FIFO may cause head-of-line blocking, while shortest-first may lead to starvation (worse average latency), deteriorating grammars that needs longer compilation (worse tail latency).

The old implementation creates a new thread pool per-compilation, which is closer to the latter I guess.

DarkSharpness avatar Nov 07 '25 08:11 DarkSharpness