OneTrainer deterministic LoRA initialization

I noticed during multi-GPU experiments that the model parameters weren't the same on all GPUs. This is because the LoRA initialization used the system seed and was not deterministic. This PR changes that, which is also nice to have for single-GPU training, because we have wondered before why repeating the same training with the same parameters doesn't have the same outcome.

Apr 20 '25 10:04 dxqb

I am tending towards closing this PR:

it is more complicated than expected
Multi-GPU https://github.com/Nerogar/OneTrainer/pull/816 doesn't require it anymore. Instead of relying on deterministic initialization on all GPUs, the parameters of GPU 0 are broadcast to all other GPUs, to start with an identical model state since this commit https://github.com/Nerogar/OneTrainer/pull/816/commits/74633b87633cbe27bccf2b580b8126c569d6fe4e
this is safer anyway in case of (current and future) bugs in deterministic initialization
the benefit otherwise is quite limited

Unless someone is very interested in deterministic LoRA initialization for other reasons, I'd propose to close this PR without merge

Jul 10 '25 17:07 dxqb

I think there is still value in having this feature, but it's more of a "nice to have". Deterministic initialization could improve reproducibility of training runs, which can make testing easier.

Jul 19 '25 19:07 Nerogar