DeepSpeed-MII
DeepSpeed-MII copied to clipboard
[FastGen] Hot-swappable LoRA adapters?
Hey there! FastGen seems really awesome. I'm curious whether roadmap includes support for serving models with LoRA adapters? Our use case is that we have hundreds of different LoRAs we need to serve, and keeping the fully merged models live on GPUs at all times isn't feasible. It would be awesome if FastGen implemented something like S-LoRA on top of FastGen so we can serve requests from multiple LoRAs simultaneously!
Thanks for the suggestion! I don't have a concrete timeline for something like this yet, but I do think this is great feature for us to support moving forward and will work to establish a roadmap to integrate it.