AI Apprentice comments

Results 9 comments of


                                            AI Apprentice

GPTQ Formats that work (and don't)

I have the same error when trying to load `TheBloke/Llama-2-7b-Chat-GPTQ`

Fix incompatible with vllm 0.3.0 and fix size mismatch when fine-tuning model is loading

This is awesome! Thank you for the great work. It's very much needed as we keep up-to-date with vllm. Hope it can be merged soon.

Queue-Worker System

@sihanwang41 Thank you for your reply. I saw there's a RFC related to integration of queuing system in Ray serve: https://github.com/ray-project/ray/issues/32292. So I was wondering if that's something Ray-LLM would...

Prompt caching

@irasin we are sharing the same paper lol~

You script to quantize the instructor models simply doesn't work.

It works just fine for me. For GPU, you need to seek other solutions (https://discuss.pytorch.org/t/does-dynamic-quantization-support-gpu/119231).

You script to quantize the instructor models simply doesn't work.

I just used the code from the quantization section in the readme of this repo.

Roadmap

Please consider supporting quantized models, like GPTQ, AWQ, etc

Weave gives Validation Error using Custom Chat Template

Thank you. I just tried version `0.51.8` which I think is the most recent release. Unfortunately, it still throws the same validation error.

Weave gives Validation Error using Custom Chat Template

Thank you for your reply. I think I figure out where the issue is from. I'm using the Mistral-7B model which doesn't have role "system". I used a custom [chat...