Sebastian Raschka
Sebastian Raschka
Right now, given that there are so many other things to do, I haven't had that on my priority list. But we'd be happy about contributions if you are interested...
Thanks for the update @janEbert ! This looks good to me. Btw have you done a comparison (re memory usage) before and after by chance?
I see, yeah I think we should do some comparisons to make sure it works as intended. If you want to do them, that'd be nice! I suggest perhaps with...
That'd be awesome. And pls let me know in case you need any help!
@janEbert Looks awesome, which model is that? I am also rerunning some of the models in the config hub and will update the numbers accordingly!
I just ran a quick comparison on an 4xA10G machine to see if I can reproduce the config hub performance ``` | falcon-7b/lora.yaml | falcon-7b | 4 | 512 |...
Not sure. I observed it with Phi-2 too: Main branch: ```bash litgpt finetune_lora checkpoints/microsoft/phi-2/ --devices 4 ``` ``` Epoch 1 | iter 1 step 0 | loss train: 2.424, val:...
> Why does the loss train increases (for the code from this PR)? From 2.299 up to 17.512. I am curious if the whole Block was maybe accidentally trainable (instead...
That's a good point, but I think there is a different issue here that I am not understanding yet 😅. When I reran the code I observed basically the same...
Thanks for looking into this @TensorTemplar . I think that this may not be feasible then, so I am closing the PR for now. But happy to revisit this with...