Varun Sundar Rabindranath

Results 30 comments of Varun Sundar Rabindranath

> I think it makes sense to just round up to multiples of 16. Power of 2 could be too aggressive. I'll update the PR to see if that is...

Hey guys, looks like this PR broke the Ultravox LoRA tests. Both V0 and V1 `tests/lora/test_ultravox.py` "were" failing. I can't seem to repro the V0 test failure locally. I can...

The `VLLM_ALL2ALL_BACKEND="deepep_high_throughput"` is the only code path using the `trtllm_fp4_block_scale_routed_moe` api from flashinfer. @nvjullin ``` python3 -m vllm.entrypoints.openai.api_server --model openai/gpt-oss-20b --data-parallel-size 2 --tensor-parallel-size 1 --enable-expert-parallel --no-enable-prefix-caching --port 8080 --max-model-len 8192...

> How do you install deepep kernel? Which cuda version are you using? I tried https://github.com/vllm-project/vllm/tree/main/tools/ep_kernels, but seems some of the building steps need cuda 12.9 but some of them...

@elvischenv how are you invoking the scripts ? like, ``` # for hopper TORCH_CUDA_ARCH_LIST="9.0" bash install_python_libraries.sh # for blackwell TORCH_CUDA_ARCH_LIST="10.0" bash install_python_libraries.sh ``` ? also if you are using `uv`...

Hey guys. Sorry about the delay in getting a minimal repro - I have one now, PTAL. Thanks. ``` import flashinfer import torch from flashinfer import trtllm_fp4_block_scale_routed_moe from flashinfer import...

Thanks @tdoublep . The implementation looks clean and non-invasive. I left a few refactoring comments, other than that it looks good to me 🙌

Thanks @jeejeelee . Can you also update the "supported models" page for LoRA please. https://github.com/vllm-project/vllm/blob/9f1710f1ace3535920c0bb6d4cc329c36289080e/docs/source/models/supported_models.md?plain=1#L339

@tlrmchlsmth - I have the changes here https://github.com/neuralmagic/vllm/pull/57 waiting to be merged on the `neuralmagic:cutlass-moe-bf16-weights` branch. I am still getting the e2e and microbenchmarks.

Factoring out expert_map support into a separate PR https://github.com/vllm-project/vllm/pull/16861