peft
peft copied to clipboard
FIX Allow same layer adapters on different devices
Resolves #1639
The issue is that so far, we made the assumption in PEFT that all adapter weights of a specific layer are on the same device. There can be cases where it is useful to have adapters on different devices. E.g. when a user loads a lot of LoRA adapters and wants to offload those not currently in use to CPU, they would not currently be able to do so.
With this PR, we add this possibility. To achieve this, when we update an adapter layer with a new adapter, we only move that specific adapter to the device of the base layer, will not touching the other loaded adapters.
While working on this, I discovered a small bug in VeRA when adding multiple adapters, which is now also fixed.
This PR has the potential to lead to unforeseen issues, so careful review is required. After merging this, let's keep it out of releases for a while to ensure it doesn't break anything.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Thanks for the reviews. I'll wait a bit in case @iuliaturc has time to check if this PR fixes the initial issue. If we don't hear back, I'll merge in a few days.
We should let this PR "simmer" for a bit since there is a small probability that this will break some edge case we haven't thought of.
Thanks so much for the PR! I left a comment here.
TL;DR is that, indeed, only one LoRA seems to be loaded at a time, but the fix doesn't seem to address the original problem (that latency keeps creeping up the more calls we make).
Thanks for the confirmation.