Sebastian Raschka
Sebastian Raschka
> Ah your batch size is also quite small so might be best to try out `torch.compile(m, mode="reduce-overhead")` which will automatically enable cuda graphs for you > > Recently added...
Hm, what's your baseline speed on a single GPU? And how many workers are you using in the dataloader?
I see, that's weird. So basically you only get 18 min for 4 GPUs, where you get 22 min on a single GPU? That's definitely weird, I don't think I...
Hm, that's weird. Maybe compilation is not supported on the Google Colab devices. I don't really have experience with Google Colab, and off the top of my head, I don't...
Hm, that's weird. It works for me when I use ```python fabric = Fabric(accelerator="mps", devices=1) ``` Maybe you have an old version from before MPS was supported. Btw I am...
H @UltraArceus3 and @Nachiket18 . Thanks for suggesting these and offering to help implementing these! On my end, I am currently very maxed out with other projects and wouldn't have...
This is a good point. I think you are wondering about why there is this `self.linear` layer initialized on line 121, right? In my implementation in this repo, we assume...
I think your concern is that in your code, we use random weights here? ```python class LinearWithLoRA(nn.Module): def __init__(self, in_features, out_features, rank, alpha): super().__init__() # Original linear layer self.linear =...
Nice, thanks for sharing. But wait a sec, there is no code in this repo, and they simply use HF in the Readme?
Wohoo finally!!