Lil2J issues

Results 3 issues of


                                            Lil2J

[BUG] Deepspeed inference does not support the Qwen model

**Describe the bug** I use deepspeed.init_inference to accelerate the inference of the Qwen model. When I compare it with not using deepspeed.init_inference, I find that there is no acceleration. Then...

bug

inference

Using nvfp4 + kvcached + sglang results in a type mismatch error

When I use nvfp4 + kvcached + sglang, the following error occurs: File "/usr/local/lib/python3.12/dist-packages/sglang/srt/layers/attention/base_attn_backend.py", line 91, in forward return self.forward_decode( ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/sglang/srt/layers/attention/flashinfer_backend.py", line 815, in forward_decode forward_batch.token_to_kv_pool.set_kv_buffer( File "/usr/local/lib/python3.12/dist-packages/sglang/srt/mem_cache/memory_pool.py",...

Using kvcached + sglang + qwen-fp8 directly causes an out-of-bounds error. [bug]

I’m currently using an image-based setup to start kvcached + sglang, and now I want to use the Qwen3 FP8 model. The inference framework can be successfully deployed, but as...

bug