text-generation-inference Adapter support

Feature request

Enable the use of locally stored adapters as created by huggingface/peft. Ideally, this should be compatible with the most notable benefits of TGI (e.g. sharing and flash attention).

Motivation

Using models fine-tuned with PEFT is possible only by merging the adapter back in the original weights of the model. This is especially cumbersome in terms of disk space for use-cases where the user has many adapters for just one model.

Your contribution

I'm not sure how much work this may induce or if it is at all feasible (notably enabling sharding with adapters). I'll gladly read any insights on the complexity and the relevance of adding this feature.

May 30 '23 12:05 QLutz

Hello! This is something that we want to support in the future but our bandwith is very limited at the moment.

May 30 '23 15:05 OlivierDehaene

here's a wip/ poc of loading an adapter model via Peft https://github.com/ohmytofu-ai/tgi-angryface/commit/aba56c1343aa77ba0a07d14327d3e52736334308 .

This is adressing https://github.com/huggingface/text-generation-inference/issues/896#issuecomment-1691770960). I cannot test hot-swapping right now since I'm trying to finish my LlamaModel inference server and it seems to miss load_addapter methods which I'm going to implement I guess.

Aug 27 '23 15:08 chris-aeviator

I am also very interested. @OlivierDehaene: Are their any updates on when it will be implemented?

Sep 26 '23 17:09 rudolpheric

instead of support more models, I think we should get this working first? I am interested in this. There is a S-Lora out there already.

Nov 24 '23 21:11 xiaoyunwu

Everything is already working actually since quite a while actually.

Closing this.

Nov 27 '23 12:11 Narsil

@Narsil Unless I missed something (in which case I'd be very grateful for any pointers), TGI does not support loading multiple adapters for one single base model simultaneously yet.

Some mechanisms have been implemented that automate the merging of an adapter into its base model (PR #935) but that is more of a (most welcome !) convenience feature rather than the more ambitious support for adapters (best represented in the SOTA today by S-LoRA) first described in this thread.

Obviously, the final decision will come from your end but I think many of us would like to know if you plan on adding this to TGI.

Nov 27 '23 14:11 QLutz

@Narsil vLLM recently merged a multi-LoRA feature into their main branch. Perhaps this ticket should be reopened?

Jan 26 '24 20:01 mhillebrand

text-generation-inference text-generation-inference copied to clipboard

Adapter support

Feature request

Motivation

Your contribution

text-generation-inference
text-generation-inference copied to clipboard