Torsten Scholak

Results 114 comments of Torsten Scholak

Hey @jlamypoirier. This introduces a lot of complexity, and based on your own comments, it is not yet user-friendly or fully supported. Given that this feature is not urgent, I'd...

I think this can work well with Python's implicit namespace packages, https://peps.python.org/pep-0420/ In a nutshell, we can leave the fast_llm namespace open, so that third-party packages can add to it....

Probably 2. is best. Do we really need this though, especially now? I'd rather focus on making the codebase more friendly to work with for people who need or want...

hey @jlamypoirier, thanks for kicking off LoRA! However, the list of features above doesn't align with the scoped execution plan of #149. Specifically: * we only need LoRA on `Wq`...

LoRA by default applies only to Q and V because this provides the best tradeoff between efficiency, compute, and fine-tuning performance. The plan in #149 was to follow this standard,...

LoRA does not work the way you're assuming. In standard LoRA fine-tuning, the MLP remains frozen (like everything else in the transformer). Only the LoRA weights on Q and V...

We do not need LoRA for MLP.

It sounds like some design choices in Fast-LLM are making LoRA harder to implement. Can you clarify whether keeping the tensor merged is actually necessary for performance, or whether we...

Why do you need to split K and V again? If they're stored as one tensor (which remains frozen for PEFT), can't you just extract V and apply LoRA there?...

The issue here is that Fast-LLM now has its own way of specifying where LoRA is applied, which is different from how PEFT does it (see [PEFT's LoraConfig](https://github.com/huggingface/peft/blob/de88c703065fdd3a05521da1054c0463d16ea33c/src/peft/tuners/lora/config.py#L222)). Had we...