Sebastian Raschka

Results 846 comments of Sebastian Raschka

Right now, given that there are so many other things to do, I haven't had that on my priority list. But we'd be happy about contributions if you are interested...

Thanks for the update @janEbert ! This looks good to me. Btw have you done a comparison (re memory usage) before and after by chance?

I see, yeah I think we should do some comparisons to make sure it works as intended. If you want to do them, that'd be nice! I suggest perhaps with...

That'd be awesome. And pls let me know in case you need any help!

@janEbert Looks awesome, which model is that? I am also rerunning some of the models in the config hub and will update the numbers accordingly!

I just ran a quick comparison on an 4xA10G machine to see if I can reproduce the config hub performance ``` | falcon-7b/lora.yaml | falcon-7b | 4 | 512 |...

Not sure. I observed it with Phi-2 too: Main branch: ```bash litgpt finetune_lora checkpoints/microsoft/phi-2/ --devices 4 ``` ``` Epoch 1 | iter 1 step 0 | loss train: 2.424, val:...

> Why does the loss train increases (for the code from this PR)? From 2.299 up to 17.512. I am curious if the whole Block was maybe accidentally trainable (instead...

That's a good point, but I think there is a different issue here that I am not understanding yet 😅. When I reran the code I observed basically the same...

Thanks for looking into this @TensorTemplar . I think that this may not be feasible then, so I am closing the PR for now. But happy to revisit this with...