keras-cv
keras-cv copied to clipboard
Add LoRA support to Stable Diffusion to enable faster fine-tuning
Short Description
LoRA was proposed in Lora: Low-rank adaptation of large language models. For a good introduction to LoRA, refer to this section.
The idea with this feature would be to allow users to only fine-tune the LoRA-added parameters and thereby saving TONs for computing as demonstrated here and here.
Papers
Provided at the beginning.
Existing Implementations
- https://github.com/cloneofsimo/lora
- https://github.com/huggingface/diffusers/pull/1884/files (more relevant I guess as the Stable Diffusion KerasCV port partially refers to the
diffuserslibrary)
Other Information
It would be better to have this as a general feature. Other large transformer-based models will benefit from it.
@cloneofsimo has recently shared some interesting TODOs at: https://github.com/cloneofsimo/lora/discussions/138
I spent some time implementing LoRA to replace Dense layers. It's straightforward to implement tbh, but the biggest challenge IMO is the ability to easily replace a pretrained Dense layer (or EinsumDense layer) in a complex model graph.
class LoRADense(layers.Layer):
def __init__(self, units, r=4, lora_alpha=1., use_bias=False):
super(Linear, self).__init__()
self.units = units
self.r = r
self.use_bias = use_bias
self.scale = lora_alpha/self.r
down_initializer = tf.keras.initializers.RandomNormal(stddev=1/self.r)
up_initializer = tf.keras.initializers.Zeros()
self.lora_down = layers.Dense(self.r, use_bias=self.use_bias, kernel_initializer=down_initializer)
self.lora_up = layers.Dense(self.units, use_bias=self.use_bias, kernel_initializer=up_initializer)
def call(self, inputs):
return self.lora_up(self.lora_down(inputs)) * self.scale
LoRA at its core, is a reparametrization trick. But for it to be useful, it should be an easy drop-in replacement.
I also found this https://github.com/BenWhetton/keras-surgeon repository by @BenWhetton that can perform operations like - delete layers, insert layers, replace layers, etc. However, this old repository might not work with the latest TF/Keras release.
I also found this https://github.com/BenWhetton/keras-surgeon repository by @BenWhetton that can perform operations like - delete layers, insert layers, replace layers, etc. However, this old repository might not work with the latest TF/Keras release.
We had also a repackaged version few months ago by @cs-jsi but the mentioned github repo is not available: https://pypi.org/project/tf2-keras-surgeon/
https://github.com/keras-team/keras-cv/issues/1275#issuecomment-1385447112
@ayulockin Would be nice to use the LoRADense in an end-to-end case. That should include freezing of the original layers (whatever that means for a pre-trained model under question), updating the matrices introduced using the LoRA scheme, and finally merging the LoRA parameters with the original weights if that makes sense.
loralib provides nice abstractions for the dense layers that one can use alongside the regular dense layers to control the parameterization.
It would be better to have this as a general feature. Other large transformer-based models will benefit from it.
Not too sure about it. For Transformer models that implement the core transformer blocks using Dense layers, I think LoRA abstractions should be plugin and play. But for other models where one might have other layers (i.e., non-Dense layers) that may benefit from LoRA, would likely not benefit from this design.
This is why we at Hugging Face started only with Diffusion-specific LoRA updates. So far, in our experiments, we have found that updating the attention block parameters using LoRA for Stable Diffusion is sufficient enough for high-fidelity image generation. This is kind of in line with the original LoRA findings too.
Would be nice to use the LoRADense in an end-to-end case.
@sayakpaul I am trying to build something here. Will share once I have it.
Not too sure about it
The reason I said that we should think of making it available as a generic feature was to make it align with KerasCV mission. I know LoRA is designed with attention block as the focus of study, and all the ablation studies were done for it. But given it's a neat reparameterization technique, one might want to try it with something as old as VGG. It might not work but would be good to experiment with.
Another reason to suggest that it should be a generic block is that LoRA promises better deployment of large models. One might be able to deploy Swin Transformer (taking as example) with LoRA.
Yeah for CNN models and for models like Swin, which blocks to focus on when using LoRA deserves a separate study on its own.
It would be better to have this as a general feature. Other large transformer-based models will benefit from it.
@ayulockin I second this. Some LLMs mdoels require such strategy on specific devices. If its turn out to be general, it should be added in core Keras.
but the biggest challenge IMO is the ability to easily replace a pretrained Dense layer (or EinsumDense layer) in a complex model graph.
https://github.com/keras-team/tf-keras/issues/262
On the fly to keras core https://github.com/keras-team/keras/pull/18942
Thanks for reporting the issue! We have consolidated the development of KerasCV into the new KerasHub package, which supports image, text, and multi-modal models. Please read https://github.com/keras-team/keras-hub/issues/1831. KerasHub will support all the core functionality of KerasCV.
KerasHub can be installed with !pip install -U keras-hub. Documentation and guides are available at keras.io/keras_hub.
With our focus shifted to KerasHub, we are not planning any further development or releases in KerasCV. If you encounter a KerasCV feature that is missing from KerasHub, or would like to propose an addition to the library, please file an issue with KerasHub.
This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.
This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.