text-generation-inference
text-generation-inference copied to clipboard
SPQR discussion
Feature request
https://github.com/Vahe1994/SpQR
@TimDettmers already published on Twitter, the new 3.35 bit per parameter technique by SPQR I am watching the progress and in the near time they want to add the inference code.
Motivation
https://github.com/huggingface/text-generation-inference/pull/438
As there is already an PR for gptq which still needs more than 4 bit per parameter in 4bit precision I thought maybe you are open for that new technique too.
Your contribution
While I just tried the inference server the first time today (shame on me) and directly added to some production deployments as replacement I am very new in that code base. If the inference code is published I could give it a try to adapt it to this repository, but if the authors themselves or the inference server team already have plans to take that part I definitely won't be sad about it 😁
Hello! Thanks for your interest in this repo and in spqr! We are communicating with tim to add spqr to tgi as soon as possible so stay tuned :)
Hi Olivier, just for interest Did they told you any kind of information about the time ? It seems like the dedup PR of SPQR will be merged soon and hopefully the save function for models will follow fast.
https://github.com/Vahe1994/SpQR/pull/32
Just found that model saving is WIP
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.