DoRA
DoRA copied to clipboard
I find some confusion code in pefy
code: result_dora = (mag_norm_scale - 1) * (F.linear(x, transpose(weight, self.fan_in_fan_out)) ) + mag_norm_scale * lora_B(lora_A(x)) * scaling Question: what is the effect of (mag_norm_scale - 1) and mag_norm_scale ? And, result_dora can't equals the F.linear(x, transpose(weight, self.fan_in_fan_out)) in the Initializing stage due to the parameter "mag_norm_scale - 1"