sglang [Bug] H20 deepseek infer enable flashinfer mla hang

Checklist

[x] 1. I have searched related issues but cannot get the expected help.
[x] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
[ ] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
[ ] 5. Please use English, otherwise it will be closed.

Describe the bug

use sglang (v0.4.3) for deepseek-r1 on 8 H20, enable flashinfer mla, it hangs when flashinfer loading jit ops.

Reproduction

python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --trust-remote-code --enable-flashinfer-mla --disable-radix-cache --tp 8

Environment

8 H20

Feb 14 '25 09:02 ProphetPeng

Try to install the latest version of flashinfer and remove ~/.cache/flashinfer.

Feb 14 '25 09:02 ispobock

ref https://github.com/flashinfer-ai/flashinfer/issues/825#issuecomment-2658773255

Feb 14 '25 09:02 zhyncs

@ProphetPeng hi, Is there anything to update? I meet the same problem (sglang 0.4.3 + flashinfer 0.2.1.post1) @ispobock hi, I try to reinstall flashinfer 0.2.1.post1 but the problem still exist

Feb 15 '25 08:02 ICENacl

Try to install the latest version of flashinfer and remove ~/.cache/flashinfer.

Thanks, it works for me. But it's slower than triton kernel with H20 for short context.

Feb 15 '25 10:02 ProphetPeng

Does flashinfer has better performance?

Feb 16 '25 03:02 YangZeyu95

Does flashinfer has better performance?

Because the radix cache is disabled in the current version, performance will be reduced for general input and output. Flashinfer is effective for long context input scenarios and will improve throughput. @YangZeyu95

Feb 17 '25 00:02 lambert0312

@lambert0312 @ProphetPeng Please try pulling the latest main branch, now --enable-flashinfer-mla and radix cache can be used together.

Feb 26 '25 20:02 Fridge003

@lambert0312 @ProphetPeng Please try pulling the latest main branch, now --enable-flashinfer-mla and radix cache can be used together.

@Fridge003 I've verified it, no problem.

Mar 02 '25 09:03 lambert0312