MoRA ReMoRa + DoRa improves on ReMoRa

ReMoRa + DoRa improves on ReMoRa

Open catid opened this issue 1 year ago • 2 comments

Thank you for sharing your results. In return I will share my own:

If you reformulate the code so that during the forward pass, it adds the decompressed MoRa weights into the nn.Linear weights, then you reduce the number of multiplies to the normal number. Furthermore, it becomes compatible with DoRa. In my testing, alternating between repeat and repeat_interleave (ReMoRa) improves on MoRa continued training, and ReMoRa + DoRa improves on ReMoRa.

Jun 07 '24 05:06 catid

Thanks for sharing the results and advice.

I have tested adding decompressed MoRA to the weight before, but it can be slow in large language models, which needs to copy the entire weight during the forward pass (maybe this can further optimized, since MoRA can directly copy its weight into origin linear instead of multiplication of two matrices like LoRA to merge back).

For ReMoRA + DoRA, are you adding DoRA and MoRA in a linear layer, which seems to use larger trainable parameters than ReMoRA? However, the idea of using both MoRA and LoRA in a linear layer seems interesting, and this might take advantage of both of them.

Jun 07 '24 06:06 kongds

Example: https://github.com/catid/dora/blob/9b2055d0b8dd73890e6fbca585a0e52a6a87dde3/dora.py#L66

Aug 09 '24 20:08 catid

MoRA MoRA copied to clipboard

ReMoRa + DoRa improves on ReMoRa

MoRA
MoRA copied to clipboard