Andrei-Aksionov

Results 70 comments of Andrei-Aksionov

Maybe it's about time to have a separate .py file with the shared logic? Because all `prepare.py` files for shakespeare and these two new datasets basically do the same thing....

Hello @maximedb I have a feeling that you've already found the answer since you asked this question last December, nevertheless I would like to answer it. In the basic implementation...

@rasbt How much of an improvement in VRAM consumption you saw with LoRA+GaLore? With any PEFT algo the amount of parameters to optimize shouldn't be that significant.

Hey @Dev-Khant You don't need my approval, since I'm not a maintainer (though I approved anyway 🙂). @rasbt Since you are a markdown Jedi, could you look at the changes...

Sure, I can do this. Hopefully it's not too complicated 🤞

So, there are 3 models available: As can be seen from the table, `2b` and `7b` models are mostly for code completion, they require a special prompt in format: ```text...

What if there is only a single device, but the model doesn't fit? Shouldn't the code switch to layers offloading? I think DeepSpeed strategy from Fabric supports it.

I think @lantiga is interested in layers offloading. Ollama/LlaMA.cpp uses layers offloading, but they definitely use something more to achieve decent latency. The question is can we get something similar...

These are all good questions. > We need to evaluate if we want to make this change 1. I believe all the latest models doesn't have an interleaved placement in...