LoRA-ViT
LoRA-ViT copied to clipboard
Why only tune query and value in every attention block?
Hello @JamesQFreeman,
Thanks for your great code for easy re-implementation of LoRA in ViT. It's very useful for us to adapt to our own task.
However, I have observed that only the query and value weights in the attention block are being tuned. I'm curious about the rationale behind this. In the original LoRA paper, the authors recommend tuning all four weights in the attention block (query, key, value, and projection linear layer) for better performance.
Is there a trade-off between computational efficiency and final performance that led to this decision? What's more, have you seen an improvement in results by including the key and projection linear layer weights for tuning in your own task?
Thank you, and I look forward to hearing from you.