FastChat
FastChat copied to clipboard
The effect of lora finetune with difference target_modules
Hello, I using default target_modules q_proj, v_proj the result looks good.
Will it more good if I using more target_modules trainable? such as :
"q_proj",
"v_proj",
"down_proj",
"gate_proj",
"up_proj",
I heard that people also get good results fine-tuning "fc1" and "fc2" modules from this paper:
"we conclude that modifying head attention shows the best results when the parameter budget is very small, while the FFN can better utilize modifications at larger capacities."
will fc1 and fc2 add additional params?
Yes, I believe it does add additional parameters.