Andrei-Aksionov comments

Results 70 comments of


                                            Andrei-Aksionov

Unexpected behaviour in inference with merged QLoRA weights

Hi @michele-milesi Thanks for reporting. > We analyzed the dtype of the pre-trained model weights at the moment of the call to the merge_lora_weights() function, some of them are torch.uint8...

Unexpected behaviour in inference with merged QLoRA weights

Got it. I'll check later this week.

How does MergedLinear work?

Hello @peterkim95 I've added some annotation for LoRA code in lit-llama [repo](https://github.com/Lightning-AI/lit-llama/blob/main/lit_llama/lora.py) which you can find helpful. --- Nevertheless I don't quite understand why there is a combination of Linear...

How does MergedLinear work?

Hello @clalanliu So as I understand with `nn.Conv1d` and groups parameter each part in the combined `qkv` matrix will be processed independently, while with `nn.Linear` `lora_B` matrix will "see" and...

How does MergedLinear work?

> There is no need to do so, because the input of QKV matrices are all the same (that is, x). Oh boy, how did I miss that 🤣. Thanks

LitGPT chat terminates weirdly

I recommend to validate it on a local machine. Terminal in a Studio might behave weirdly.

support GaLore

Hello @batman-do It looks like GaLore is a drop-in replacement for the PyTorch optimizer, meaning that you can take any script that does pretraining/fine-tuning and replacement the part that defines...

Model.py simplifications

Well, that's expected ☺️ So what should I do: 1. Close this PR and you will add the changes that you like on your own, of course if you want...

Add Linear batch_size Scheduler

@erikqu Your solution is small and simple (which is awesome), but it increases batch_size every third step from some starting value (currently it's 4, but I think you will make...

Configurator will work correctly with default None values

Oops 😺 Not only being guilty for "works on my machine" approach, but also left not necessary import (from another approach). I guess I am too used to various linters...