elvischenv

Results 8 comments of elvischenv

How do you install deepep kernel? Which cuda version are you using? I tried https://github.com/vllm-project/vllm/tree/main/tools/ep_kernels, but seems some of the building steps need cuda 12.9 but some of them need...

@varun-sundar-rabindranath I did use that scripts but encountered errors: ``` -- Configuring done (1.2s) -- Generating done (0.4s) -- Build files have been written to: /workspace/vllm/tools/ep_kernels/ep_kernels_workspace/nvshmem_build + cmake --build /workspace/vllm/tools/ep_kernels/ep_kernels_workspace/nvshmem_build/...

Yes, I am using `TORCH_CUDA_ARCH_LIST="10.0" bash install_python_libraries.sh`. `nvcc fatal : Unsupported gpu architecture 'compute_70'` is not related to `uv`, right? That error should be related to the nvshmem version. The...

Seems this PR broke the original `--async-scheduling` on B200: ``` VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8=1 vllm serve openai/gpt-oss-120b --async-scheduling ``` ``` vllm bench serve --model openai/gpt-oss-120b --dataset-name random --ignore-eos --max-concurrency 1 --num-prompts 10 --random-input-len...

@FlamingoPg > May I ask how long a single tuning run takes now? For B200+gpt-oss-120b, it takes about 1 min from my local test: ``` [2025-10-29 02:42:01] Running FlashInfer autotune......

Hi @FlamingoPg, It seems that CI failures are not related to my PR. Could you help confirm? Thanks!

> https://github.com/sgl-project/sglang/actions/runs/19458190143/job/55706006309?pr=12306 > @elvischenv do you think this failure is related to this PR? Should be related to a PR that merged 2 weeks ago: #11133. Pushed a fix and...

Hi @Qiaolin-Yu @FlamingoPg @Fridge003, could you help us to merge this PR? The CI failures are all unrelated. Thanks!