feat: Update logits bitmask kernel to v3
The XGrammar team provides important insights on the kernel workload. In most cases, the bitmask tensor is almost-full (bit values are 1) and almost-empty (bit values are 0).
Compared the kernel version on main (v2), the PR introduces the kernel developed in https://github.com/mlc-ai/xgrammar/pull/186 (v3):
- The kernel v3 shows ~1.3x and ~2.0x speedup on large batch sizes for the almost-full and almost-empty scenarios, respectively.
- The kernel v3 slightly sacrifices the performance on half-full scenario, compared to v2.
See https://github.com/mlc-ai/xgrammar/tree/main/examples/benchmark#benchmark-apply-token-bitmask-inplace-kernels for more perf numbers. Please see https://github.com/mlc-ai/xgrammar/pull/186 for more background.
/bot run
PR_Github #253 [ run ] triggered by Bot
/bot run
PR_Github #292 [ run ] triggered by Bot
PR_Github #253 [ run ] completed with state ABORTED
/LLM/main/L0_MergeRequest_PR pipeline #248 completed with status: 'FAILURE'
PR_Github #292 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #281 completed with status: 'FAILURE'
/bot run
PR_Github #310 [ run ] triggered by Bot
PR_Github #310 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #294 completed with status: 'FAILURE'
/bot run
PR_Github #347 [ run ] triggered by Bot
PR_Github #347 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #319 completed with status: 'FAILURE'
/bot run
PR_Github #433 [ run ] triggered by Bot
PR_Github #433 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #371 completed with status: 'FAILURE'
/bot run
PR_Github #442 [ run ] triggered by Bot
PR_Github #442 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #378 completed with status: 'FAILURE'
/bot run
PR_Github #491 [ run ] triggered by Bot
PR_Github #491 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #423 completed with status: 'SUCCESS'
/bot reuse-pipeline
PR_Github #527 [ reuse-pipeline ] triggered by Bot
PR_Github #527 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #491 for commit 60fd55d
/bot reuse-pipeline
PR_Github #535 [ reuse-pipeline ] triggered by Bot
PR_Github #535 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #491 for commit ff297de