text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Adapter support

Open QLutz opened this issue 1 year ago • 1 comments

Feature request

Enable the use of locally stored adapters as created by huggingface/peft. Ideally, this should be compatible with the most notable benefits of TGI (e.g. sharing and flash attention).

Motivation

Using models fine-tuned with PEFT is possible only by merging the adapter back in the original weights of the model. This is especially cumbersome in terms of disk space for use-cases where the user has many adapters for just one model.

Your contribution

I'm not sure how much work this may induce or if it is at all feasible (notably enabling sharding with adapters). I'll gladly read any insights on the complexity and the relevance of adding this feature.

QLutz avatar May 30 '23 12:05 QLutz

Hello! This is something that we want to support in the future but our bandwith is very limited at the moment.

OlivierDehaene avatar May 30 '23 15:05 OlivierDehaene

here's a wip/ poc of loading an adapter model via Peft https://github.com/ohmytofu-ai/tgi-angryface/commit/aba56c1343aa77ba0a07d14327d3e52736334308 .

This is adressing https://github.com/huggingface/text-generation-inference/issues/896#issuecomment-1691770960). I cannot test hot-swapping right now since I'm trying to finish my LlamaModel inference server and it seems to miss load_addapter methods which I'm going to implement I guess.

chris-aeviator avatar Aug 27 '23 15:08 chris-aeviator

I am also very interested. @OlivierDehaene: Are their any updates on when it will be implemented?

rudolpheric avatar Sep 26 '23 17:09 rudolpheric

instead of support more models, I think we should get this working first? I am interested in this. There is a S-Lora out there already.

xiaoyunwu avatar Nov 24 '23 21:11 xiaoyunwu

Everything is already working actually since quite a while actually.

Closing this.

Narsil avatar Nov 27 '23 12:11 Narsil

@Narsil Unless I missed something (in which case I'd be very grateful for any pointers), TGI does not support loading multiple adapters for one single base model simultaneously yet.

Some mechanisms have been implemented that automate the merging of an adapter into its base model (PR #935) but that is more of a (most welcome !) convenience feature rather than the more ambitious support for adapters (best represented in the SOTA today by S-LoRA) first described in this thread.

Obviously, the final decision will come from your end but I think many of us would like to know if you plan on adding this to TGI.

QLutz avatar Nov 27 '23 14:11 QLutz

@Narsil vLLM recently merged a multi-LoRA feature into their main branch. Perhaps this ticket should be reopened?

mhillebrand avatar Jan 26 '24 20:01 mhillebrand