Torsten Scholak comments

Results 114 comments of


                                            Torsten Scholak

[Prototype] Option to configure layers independently

Hey @jlamypoirier. This introduces a lot of complexity, and based on your own comments, it is not yet user-friendly or fully supported. Given that this feature is not urgent, I'd...

[feat] Generalize dynamic config classes

I think this can work well with Python's implicit namespace packages, https://peps.python.org/pep-0420/ In a nutshell, we can leave the fast_llm namespace open, so that third-party packages can add to it....

[feat] Generalize dynamic config classes

Probably 2. is best. Do we really need this though, especially now? I'd rather focus on making the codebase more friendly to work with for people who need or want...

[Prototype] LoRA

hey @jlamypoirier, thanks for kicking off LoRA! However, the list of features above doesn't align with the scoped execution plan of #149. Specifically: * we only need LoRA on `Wq`...

[Prototype] LoRA

LoRA by default applies only to Q and V because this provides the best tradeoff between efficiency, compute, and fine-tuning performance. The plan in #149 was to follow this standard,...

[Prototype] LoRA

LoRA does not work the way you're assuming. In standard LoRA fine-tuning, the MLP remains frozen (like everything else in the transformer). Only the LoRA weights on Q and V...

[Prototype] LoRA

We do not need LoRA for MLP.

[Prototype] LoRA

It sounds like some design choices in Fast-LLM are making LoRA harder to implement. Can you clarify whether keeping the tensor merged is actually necessary for performance, or whether we...

[Prototype] LoRA

Why do you need to split K and V again? If they're stored as one tensor (which remains frozen for PEFT), can't you just extract V and apply LoRA there?...

[Prototype] LoRA

The issue here is that Fast-LLM now has its own way of specifying where LoRA is applied, which is different from how PEFT does it (see [PEFT's LoraConfig](https://github.com/huggingface/peft/blob/de88c703065fdd3a05521da1054c0463d16ea33c/src/peft/tuners/lora/config.py#L222)). Had we...