lora icon indicating copy to clipboard operation
lora copied to clipboard

Experiment applying MARVEL

Open Yoogeonhui opened this issue 2 years ago • 4 comments

https://openreview.net/forum?id=lq62uWRJjiY

Recent progress applying MARVEL in LM showed better performance compared to LoRA. I didn't read the paper thoroughly but this paper also seems to be applicable to the diffusion process.

Yoogeonhui avatar Jan 30 '23 03:01 Yoogeonhui

Wow, this looks very similar to the idea @brian6091 had. Might want to have a look here!

cloneofsimo avatar Jan 30 '23 03:01 cloneofsimo

Some random initial thoughts on the idea

  1. So why not just optimize P, Q on the Stiefel manifold, which is the idea I had in mind? Seems like this mimic-retraction-like approach might not be optimal way of doing things. image Many alternative approaches could be readily made here, It looks really awesome

How they allocate diagonals to correspond to budget, and perform discrete optimization there sounds amazing. It also seems to have an effect on generalization as well? Considering how well they perform compared to full-fine tuning. Results are clearly very impressive.

image

  1. End rank distribution results demonstrated, image

This is incredibly fascinating. One can imagine that this is essentially saving a LOT of budget, as one "has" to use rank 12 for the above case to get the above model. deeper the layer, the "larger" the rank needs which definitely makes sense in very classical feature representation perspective.

This is also demonstrated by figure 1, image

  1. Pruning with diagonal term is really simple, and according to them, it consistently outperforms naive LoRA and has similar performance to MARVEL (algorithm 1.) MARVEL is difficult to implement, but singular value based proximal operator is simple, so $S_i = |\lambda_i|$ might be a good starting point.

image

cloneofsimo avatar Jan 30 '23 03:01 cloneofsimo

I actually find this very good, since we already needed to work on scaling part as well. So saving into LoRA format would be the common part, which means saving format needed to be fixed anyways. We can pull this off extremely nicely by just inheriting LoRA class and reparametrizing their form, and modifying the save function. Might as well make it compatible with diffusers' format all at once. Getting a decent performance with this modification sounds almost certain.

cloneofsimo avatar Jan 30 '23 03:01 cloneofsimo

Thanks for the paper! @cloneofsimo this should mix nicely with training/varying rank by block. I'm having a closer read of the paper.

brian6091 avatar Jan 30 '23 08:01 brian6091