Andrei-Aksionov comments

Results 70 comments of


                                            Andrei-Aksionov

Add two popular datasets for character level LM

Maybe it's about time to have a separate .py file with the shared logic? Because all `prepare.py` files for shakespeare and these two new datasets basically do the same thing....

Why Linear and MergedLinear?

Hello @maximedb I have a feeling that you've already found the answer since you asked this question last December, nevertheless I would like to answer it. In the basic implementation...

Introduce OptimizerArgs and add support for GaLore

@rasbt How much of an improvement in VRAM consumption you saw with LoRA+GaLore? With any PEFT algo the amount of parameters to optimize shouldn't be that significant.

Add Danube2

Hey @Dev-Khant You don't need my approval, since I'm not a maintainer (though I approved anyway 🙂). @rasbt Since you are a markdown Jedi, could you look at the changes...

CodeGemma

Sure, I can do this. Hopefully it's not too complicated 🤞

CodeGemma

So, there are 3 models available: As can be seen from the table, `2b` and `7b` models are mostly for code completion, they require a special prompt in format: ```text...

Smart choice of the inference algorithm

What if there is only a single device, but the model doesn't fit? Shouldn't the code switch to layers offloading? I think DeepSpeed strategy from Fabric supports it.

Smart choice of the inference algorithm

I think @lantiga is interested in layers offloading. Ollama/LlaMA.cpp uses layers offloading, but they definitely use something more to achieve decent latency. The question is can we get something similar...

Drop interleave placement in QKV matrix

These are all good questions. > We need to evaluate if we want to make this change 1. I believe all the latest models doesn't have an interleaved placement in...