Eric Buehler

Results 543 comments of Eric Buehler

It looks like this issue is because of losing precision on long context windows: https://github.com/huggingface/transformers/pull/29285

Yes, that is the plan.

Yes! We are beginning work on this topic now.

@NiuBlibing, we have llama3 support ready: the README has a few examples. I will add Qwen support shortly.

@NiuBlibing, I just added Qwen2 support. Quantized Qwen2 support will be added in the next few days.

> Can you add https://huggingface.co/Snowflake/snowflake-arctic-instruct? @cargecla1, yes! It will be a great use case for ISQ.

> Hello! > Any plans for adding multimodal (e.g. llava) and embedding models? @francis2tm, yes. I plan on supporting Llava and embedding models this week.

@NiuBlibing, you can run Qwen now with ISQ, which will quantize it.

@kir-gadjello > Would be nice to support at least one strong vision-language model: https://huggingface.co/openbmb/MiniCPM-V-2 https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5 with an option to compute visual frontend model on CPU. You might find it easier...

Candle-lora can be applied to any model by adding the derive and attribute macros to each struct containing a linear, embedding, or conv layer. Therefore, it should be possible to...