AI Apprentice

Results 9 comments of AI Apprentice

I have the same error when trying to load `TheBloke/Llama-2-7b-Chat-GPTQ`

This is awesome! Thank you for the great work. It's very much needed as we keep up-to-date with vllm. Hope it can be merged soon.

@sihanwang41 Thank you for your reply. I saw there's a RFC related to integration of queuing system in Ray serve: https://github.com/ray-project/ray/issues/32292. So I was wondering if that's something Ray-LLM would...

@irasin we are sharing the same paper lol~

It works just fine for me. For GPU, you need to seek other solutions (https://discuss.pytorch.org/t/does-dynamic-quantization-support-gpu/119231).

I just used the code from the quantization section in the readme of this repo.

Please consider supporting quantized models, like GPTQ, AWQ, etc

Thank you. I just tried version `0.51.8` which I think is the most recent release. Unfortunately, it still throws the same validation error.

Thank you for your reply. I think I figure out where the issue is from. I'm using the Mistral-7B model which doesn't have role "system". I used a custom [chat...