Dipika Sikka

Results 27 comments of Dipika Sikka

@robertgshaw2-neuralmagic I think updating the `get_scheme` function is beyond this scope of this PR. I'd like to first land using compressed-tensors without any dependency conflicts. Refactoring `get_scheme` should be a...

> Just left one quick comment. I'm going to pull this PR in and try it with a compressed-tensors W8A16 model. Confirmed this works with compressed-tensors w8a16

Still need to test with a deepseek-v2 model

@ElizaWszola seems like the kernel test failures start after `tests/kernels/test_moe.py` - could you take a look?

> This LGTM but have you verified that DeepSeek MoE is okay with this PR? yes. deepkseek, mixtral and qwen

@mgoin can't resolve but addressed all but one comment

Latency Benchmarking with Two 82 GB A100s: ``` Mixtral Fused MoE with AWQ: Avg latency: 1.3650233147976298 seconds 10% percentile latency: 1.3638953405432404 seconds 25% percentile latency: 1.3643284866120666 seconds 50% percentile latency:...