AI Apprentice
AI Apprentice
I have the same error when trying to load `TheBloke/Llama-2-7b-Chat-GPTQ`
This is awesome! Thank you for the great work. It's very much needed as we keep up-to-date with vllm. Hope it can be merged soon.
@sihanwang41 Thank you for your reply. I saw there's a RFC related to integration of queuing system in Ray serve: https://github.com/ray-project/ray/issues/32292. So I was wondering if that's something Ray-LLM would...
@irasin we are sharing the same paper lol~
It works just fine for me. For GPU, you need to seek other solutions (https://discuss.pytorch.org/t/does-dynamic-quantization-support-gpu/119231).
I just used the code from the quantization section in the readme of this repo.
Please consider supporting quantized models, like GPTQ, AWQ, etc
Thank you. I just tried version `0.51.8` which I think is the most recent release. Unfortunately, it still throws the same validation error.
Thank you for your reply. I think I figure out where the issue is from. I'm using the Mistral-7B model which doesn't have role "system". I used a custom [chat...