Dipika Sikka
Dipika Sikka
Note: Splitting this PR into two separate PRs. PR 1/2: https://github.com/vllm-project/vllm/pull/7334
@shuailong616 Hi! Can you try swapping the `dtype=torch.half` to `dtype=torch.bfloat16`?
@Muennighoff Thank you for the PR! Do you mind updating this to rebase off of main?
Do you mind sharing the outputs you get when running the compressed model through transformers?
> > > Is this issue solved? I got same problem in Qwen2.5-72B-Instruct-GPTQ-Int8 > > > > > > This issue has not been resolved yet, and due to the...
> > Do you mind sharing the outputs you get when running the compressed model through transformers? > > I have re-quantized qwen2.5-72b (all parameters) using compressed-tensors. Below are its...