[Bug] H20 deepseek infer enable flashinfer mla hang
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [ ] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [ ] 5. Please use English, otherwise it will be closed.
Describe the bug
use sglang (v0.4.3) for deepseek-r1 on 8 H20, enable flashinfer mla, it hangs when flashinfer loading jit ops.
Reproduction
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --trust-remote-code --enable-flashinfer-mla --disable-radix-cache --tp 8
Environment
8 H20
Try to install the latest version of flashinfer and remove ~/.cache/flashinfer.
ref https://github.com/flashinfer-ai/flashinfer/issues/825#issuecomment-2658773255
@ProphetPeng hi, Is there anything to update? I meet the same problem (sglang 0.4.3 + flashinfer 0.2.1.post1) @ispobock hi, I try to reinstall flashinfer 0.2.1.post1 but the problem still exist
Try to install the latest version of flashinfer and remove
~/.cache/flashinfer.
Thanks, it works for me. But it's slower than triton kernel with H20 for short context.
Does flashinfer has better performance?
Does flashinfer has better performance?
Because the radix cache is disabled in the current version, performance will be reduced for general input and output. Flashinfer is effective for long context input scenarios and will improve throughput. @YangZeyu95
@lambert0312 @ProphetPeng Please try pulling the latest main branch, now --enable-flashinfer-mla and radix cache can be used together.
@lambert0312 @ProphetPeng Please try pulling the latest main branch, now
--enable-flashinfer-mlaand radix cache can be used together.
@Fridge003 I've verified it, no problem.