text-generation-inference SPQR discussion

Feature request

https://github.com/Vahe1994/SpQR

@TimDettmers already published on Twitter, the new 3.35 bit per parameter technique by SPQR I am watching the progress and in the near time they want to add the inference code.

Motivation

https://github.com/huggingface/text-generation-inference/pull/438

As there is already an PR for gptq which still needs more than 4 bit per parameter in 4bit precision I thought maybe you are open for that new technique too.

Your contribution

While I just tried the inference server the first time today (shame on me) and directly added to some production deployments as replacement I am very new in that code base. If the inference code is published I could give it a try to adapt it to this repository, but if the authors themselves or the inference server team already have plans to take that part I definitely won't be sad about it 😁

Jun 14 '23 20:06 flozi00

Hello! Thanks for your interest in this repo and in spqr! We are communicating with tim to add spqr to tgi as soon as possible so stay tuned :)

Jun 14 '23 22:06 OlivierDehaene

Hi Olivier, just for interest Did they told you any kind of information about the time ? It seems like the dedup PR of SPQR will be merged soon and hopefully the save function for models will follow fast.

Jun 28 '23 20:06 flozi00

https://github.com/Vahe1994/SpQR/pull/32

Just found that model saving is WIP

Aug 09 '23 21:08 flozi00

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Jul 25 '24 01:07 github-actions[bot]

text-generation-inference text-generation-inference copied to clipboard

SPQR discussion

Feature request

Motivation

Your contribution

text-generation-inference
text-generation-inference copied to clipboard