Eric Buehler
Eric Buehler
It looks like this issue is because of losing precision on long context windows: https://github.com/huggingface/transformers/pull/29285
Yes, that is the plan.
Yes! We are beginning work on this topic now.
@NiuBlibing, we have llama3 support ready: the README has a few examples. I will add Qwen support shortly.
@NiuBlibing, I just added Qwen2 support. Quantized Qwen2 support will be added in the next few days.
> Can you add https://huggingface.co/Snowflake/snowflake-arctic-instruct? @cargecla1, yes! It will be a great use case for ISQ.
> Hello! > Any plans for adding multimodal (e.g. llava) and embedding models? @francis2tm, yes. I plan on supporting Llava and embedding models this week.
@NiuBlibing, you can run Qwen now with ISQ, which will quantize it.
@kir-gadjello > Would be nice to support at least one strong vision-language model: https://huggingface.co/openbmb/MiniCPM-V-2 https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5 with an option to compute visual frontend model on CPU. You might find it easier...
Candle-lora can be applied to any model by adding the derive and attribute macros to each struct containing a linear, embedding, or conv layer. Therefore, it should be possible to...