Casper

Results 293 comments of Casper

Looks like the authors do not have a plan to support MPT models

> @TheBloke or anyone else > > Do you know what `with act-order` means? From what I get, it means to sort the "activations" before quantizing them. But the trouble...

> I am able to get reasonable result from FasterTransformer + MPT-7B-Storywriter with 2 changes to FasterTransformer Thanks for documenting this. Not sure if FasterTransformer would accept a PR for...

> It looks as though we should already support it out of the box, when `quantization_config` is in the model's HF config.json. (modulo potential issues arising due to us attempting...

I have an example script that works with Mixtral: https://github.com/casper-hansen/AutoAWQ/blob/main/examples/basic_vllm.py

I just used the following Docker image and ran `pip install vllm` `runpod/pytorch:2.1.1-py3.10-cuda12.1.1-devel-ubuntu22.04`

Could you try the Docker image I referenced to see if it's an environment issue?

Not sure if this relates to #2203. Does it work in FP16 with TP > 1?

Tagging @WoosukKwon @zhuohan123 for visibility. Seems Mixtral has issues with TP > 1 when using AWQ.

Woosuk said it should be fixed in the new 0.2.7 by PR #2208. Could someone verify with AWQ version? Reference: https://github.com/vllm-project/vllm/issues/2332#issuecomment-18761736055