elvischenv comments

Results 8 comments of


                                            elvischenv

[Bug] Autotuning + trtllm_fp4_block_scale_routed_moe Issue

How do you install deepep kernel? Which cuda version are you using? I tried https://github.com/vllm-project/vllm/tree/main/tools/ep_kernels, but seems some of the building steps need cuda 12.9 but some of them need...

[Bug] Autotuning + trtllm_fp4_block_scale_routed_moe Issue

@varun-sundar-rabindranath I did use that scripts but encountered errors: ``` -- Configuring done (1.2s) -- Generating done (0.4s) -- Build files have been written to: /workspace/vllm/tools/ep_kernels/ep_kernels_workspace/nvshmem_build + cmake --build /workspace/vllm/tools/ep_kernels/ep_kernels_workspace/nvshmem_build/...

[Bug] Autotuning + trtllm_fp4_block_scale_routed_moe Issue

Yes, I am using `TORCH_CUDA_ARCH_LIST="10.0" bash install_python_libraries.sh`. `nvcc fatal : Unsupported gpu architecture 'compute_70'` is not related to `uv`, right? That error should be related to the nvshmem version. The...

[Core] Async Scheduling X Spec Decoding Compatibility

Seems this PR broke the original `--async-scheduling` on B200: ``` VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8=1 vllm serve openai/gpt-oss-120b --async-scheduling ``` ``` vllm bench serve --model openai/gpt-oss-120b --dataset-name random --ignore-eos --max-concurrency 1 --num-prompts 10 --random-input-len...

elvischenv

[Bug] Autotuning + trtllm_fp4_block_scale_routed_moe Issue

[Bug] Autotuning + trtllm_fp4_block_scale_routed_moe Issue

[Bug] Autotuning + trtllm_fp4_block_scale_routed_moe Issue

[Core] Async Scheduling X Spec Decoding Compatibility

feat: support flashinfer kernel autotune

feat: support flashinfer kernel autotune

feat: support flashinfer kernel autotune

feat: support flashinfer kernel autotune