Varun Sundar Rabindranath
Varun Sundar Rabindranath
@bnellnm I added some quant kernel tests. We should definitely add some model tests.
Thanks for the fix @ProExpertProg 🙌
Marking this draft -- These kernels are not a priority at the moment given that a masked-fused-act-mul-quant exists in https://github.com/vllm-project/vllm/tree/ll_deepgemm_opt . We can revive this when needed.
@bnellnm @youkaichao @tlrmchlsmth PTAL! Thanks.
LGTM! left a few nit comments.
LGTM! This is a nice refactor ! Thanks @LucasWilkinson
> All the LoRA tests have failed again Looking into this now 👍
Update : I enabled tests in `tests/lora/test_layers.py` for V1. The tests work locally but OOM's on the CI - I am tracking this down.
> It seems these modifications have significantly increased the time consumption for lora testing  Yes. This PR adds the v1_kernel tests in test_punica_ops.py and enables `test_layers.py` to run for...