LoRA-ViT Why only tune query and value in every attention block?

Why only tune query and value in every attention block?

Open LightersWang opened this issue 1 year ago • 1 comments

Hello @JamesQFreeman,

Thanks for your great code for easy re-implementation of LoRA in ViT. It's very useful for us to adapt to our own task.

However, I have observed that only the query and value weights in the attention block are being tuned. I'm curious about the rationale behind this. In the original LoRA paper, the authors recommend tuning all four weights in the attention block (query, key, value, and projection linear layer) for better performance.

Is there a trade-off between computational efficiency and final performance that led to this decision? What's more, have you seen an improvement in results by including the key and projection linear layer weights for tuning in your own task?

Thank you, and I look forward to hearing from you.

May 10 '23 11:05 LightersWang

LoRA-ViT LoRA-ViT copied to clipboard

Why only tune query and value in every attention block?

LoRA-ViT
LoRA-ViT copied to clipboard