Ali Sabet
Ali Sabet
Works! Thanks luffycodes 🙏 !
Hey @zwhe99 I got the model to train, but the weights aren't fully saved during checkpointing- even though I'm using the same `ZeRO-3.json` config and training settings. According to the...
Works! Thanks @luffycodes 🙏 .
Your learning rate may be too high.
@vid-koci what heuristic are you using in veles that uses so much less memory? Could I avoid your reported performance loss if I outputted the random walks instead and ran...
Yes, was originally planning to apply it to diffusion models first, but the peft library has some a more convenient api for injecting multiple LoRAs into the same model. Hoping...
Yes @sidnb13 you can stack the LoRAs into a single tensor, and broadcast slices over their corresponding batch elements.
@sidnb13 nice try! `segment_matmul` is the perfect function for a blora op, kernel's probably not optimized though. I also attempted parallelizing the blora op through matrix reshapes and stacking, seemed...