peft icon indicating copy to clipboard operation
peft copied to clipboard

FIX Allow same layer adapters on different devices

Open BenjaminBossan opened this issue 1 year ago • 2 comments

Resolves #1639

The issue is that so far, we made the assumption in PEFT that all adapter weights of a specific layer are on the same device. There can be cases where it is useful to have adapters on different devices. E.g. when a user loads a lot of LoRA adapters and wants to offload those not currently in use to CPU, they would not currently be able to do so.

With this PR, we add this possibility. To achieve this, when we update an adapter layer with a new adapter, we only move that specific adapter to the device of the base layer, will not touching the other loaded adapters.

While working on this, I discovered a small bug in VeRA when adding multiple adapters, which is now also fixed.

This PR has the potential to lead to unforeseen issues, so careful review is required. After merging this, let's keep it out of releases for a while to ensure it doesn't break anything.

BenjaminBossan avatar May 17 '24 13:05 BenjaminBossan

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Thanks for the reviews. I'll wait a bit in case @iuliaturc has time to check if this PR fixes the initial issue. If we don't hear back, I'll merge in a few days.

We should let this PR "simmer" for a bit since there is a small probability that this will break some edge case we haven't thought of.

BenjaminBossan avatar May 21 '24 13:05 BenjaminBossan

Thanks so much for the PR! I left a comment here.

TL;DR is that, indeed, only one LoRA seems to be loaded at a time, but the fix doesn't seem to address the original problem (that latency keeps creeping up the more calls we make).

iuliaturc avatar May 22 '24 23:05 iuliaturc

Thanks for the confirmation.

BenjaminBossan avatar May 23 '24 08:05 BenjaminBossan