Varun Sundar Rabindranath comments

Results 30 comments of


                                            Varun Sundar Rabindranath

[Kernel] Initial Activation Quantization Support

@bnellnm I added some quant kernel tests. We should definitely add some model tests.

[BugFix] Fix topk_softmax assert

Thanks for the fix @ProExpertProg 🙌

[Kernel] Masked act_mul and fp8-quant Kernels for Batched MoE

Marking this draft -- These kernels are not a priority at the moment given that a masked-fused-act-mul-quant exists in https://github.com/vllm-project/vllm/tree/ll_deepgemm_opt . We can revive this when needed.

[Core] Enable CUDA graphs for DP + All2All kernels

@bnellnm @youkaichao @tlrmchlsmth PTAL! Thanks.

[Kernel] AQ AZP 3/4: Asymmetric quantization kernels

LGTM! left a few nit comments.

[CI/Build] Per file CUDA Archs (improve wheel size and dev build times)

LGTM! This is a nice refactor ! Thanks @LucasWilkinson

[V1] LoRA - Add triton kernels for V1

> All the LoRA tests have failed again Looking into this now 👍

[V1] LoRA - Add triton kernels for V1

Update : I enabled tests in `tests/lora/test_layers.py` for V1. The tests work locally but OOM's on the CI - I am tracking this down.

[V1] LoRA - Add triton kernels for V1

> It seems these modifications have significantly increased the time consumption for lora testing ![image](https://private-user-images.githubusercontent.com/19733142/418847014-be6080be-abc4-4c3e-b913-c9b1fa2de95f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDEwOTM3NDYsIm5iZiI6MTc0MTA5MzQ0NiwicGF0aCI6Ii8xOTczMzE0Mi80MTg4NDcwMTQtYmU2MDgwYmUtYWJjNC00YzNlLWI5MTMtYzliMWZhMmRlOTVmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAzMDQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMzA0VDEzMDQwNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWViOTdkYTM5YTM4YmExY2VmMzQzMzQzNGQ0NTExY2I3NjQ3ZWEzYzgyNGJkZWE3NzhjZGQ3M2YzM2UxM2NhMzcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.q-ij9-uT8vQw3ksOoAXgHrJeEFGVbNKcFgNujeQEGY0) Yes. This PR adds the v1_kernel tests in test_punica_ops.py and enables `test_layers.py` to run for...

Multi-Step + Chunked Prefill with Prefill Stepping

Hey @sam-h-bean thanks for testing this out ! Can you please share the commands you used to test ? It'd help get a repo quickly.