Dipika Sikka comments

Results 27 comments of


                                            Dipika Sikka

[ Kernel ] AWQ Fused MoE

Note: Splitting this PR into two separate PRs. PR 1/2: https://github.com/vllm-project/vllm/pull/7334

[Bug]: deepseek_v2 236B on 8XA100 wrong output vllm==0.5.4

@shuailong616 Hi! Can you try swapping the `dtype=torch.half` to `dtype=torch.bfloat16`?

[Model] Add OLMoE

@Muennighoff Thank you for the PR! Do you mind updating this to rebase off of main?

[Misc] Update `gptq_marlin` to use new vLLMParameters

/ready

[Bug]: Nonsensical Sentences Generated When Inferencing INT8 Quantized Qwen2.5-72B Model

Do you mind sharing the outputs you get when running the compressed model through transformers?

[Bug]: Nonsensical Sentences Generated When Inferencing INT8 Quantized Qwen2.5-72B Model

> > > Is this issue solved? I got same problem in Qwen2.5-72B-Instruct-GPTQ-Int8 > > > > > > This issue has not been resolved yet, and due to the...

[Bug]: Nonsensical Sentences Generated When Inferencing INT8 Quantized Qwen2.5-72B Model

> > Do you mind sharing the outputs you get when running the compressed model through transformers? > > I have re-quantized qwen2.5-72b (all parameters) using compressed-tensors. Below are its...