Andrei-Aksionov
Andrei-Aksionov
Hi @michele-milesi Thanks for reporting. > We analyzed the dtype of the pre-trained model weights at the moment of the call to the merge_lora_weights() function, some of them are torch.uint8...
Got it. I'll check later this week.
Hello @peterkim95 I've added some annotation for LoRA code in lit-llama [repo](https://github.com/Lightning-AI/lit-llama/blob/main/lit_llama/lora.py) which you can find helpful. --- Nevertheless I don't quite understand why there is a combination of Linear...
Hello @clalanliu So as I understand with `nn.Conv1d` and groups parameter each part in the combined `qkv` matrix will be processed independently, while with `nn.Linear` `lora_B` matrix will "see" and...
> There is no need to do so, because the input of QKV matrices are all the same (that is, x). Oh boy, how did I miss that 🤣. Thanks
I recommend to validate it on a local machine. Terminal in a Studio might behave weirdly.
Hello @batman-do It looks like GaLore is a drop-in replacement for the PyTorch optimizer, meaning that you can take any script that does pretraining/fine-tuning and replacement the part that defines...
Well, that's expected ☺️ So what should I do: 1. Close this PR and you will add the changes that you like on your own, of course if you want...
@erikqu Your solution is small and simple (which is awesome), but it increases batch_size every third step from some starting value (currently it's 4, but I think you will make...
Oops 😺 Not only being guilty for "works on my machine" approach, but also left not necessary import (from another approach). I guess I am too used to various linters...