[FastGen] Hot-swappable LoRA adapters?

Open corbt opened this issue 2 years ago • 1 comments

Hey there! FastGen seems really awesome. I'm curious whether roadmap includes support for serving models with LoRA adapters? Our use case is that we have hundreds of different LoRAs we need to serve, and keeping the fully merged models live on GPUs at all times isn't feasible. It would be awesome if FastGen implemented something like S-LoRA on top of FastGen so we can serve requests from multiple LoRAs simultaneously!

Nov 08 '23 04:11 corbt

Thanks for the suggestion! I don't have a concrete timeline for something like this yet, but I do think this is great feature for us to support moving forward and will work to establish a roadmap to integrate it.

Nov 08 '23 17:11 cmikeh2