lorax
lorax copied to clipboard
Request for Clarification on LoRA Adapter Portability, Merging, and Benchmarks
Hello Predibase Team,
First, thank you , after quick but careful review of the code and documentation, I’d like to ask for clarification and raise some points regarding the claims and expectations around LoRA adapter portability, merging, and cross-architecture usage.
Cause honestly, i was extremely excited someone solved this and quickly disappointed.
1. LoRA Adapter Portability and Cross-Architecture Claims
The documentation and repo seem to suggest that LoRA adapters can be converted or composed across model families (e.g., Llama, Mistral, Qwen). From my understanding and experience, LoRA adapters are deeply tied to the base model’s architecture, layer naming, and hyperparameters. Even small changes—such as attention mechanism details, hidden size, tokenizer, or training regime plus a ton more —can make adapters incompatible. Is there any technical mechanism in lorax that provides true semantic or mathematical translation of adapters, or is this a best-effort attempt that matches by layer name and shape only? I mean you cant just take weight and add them together or slam them and walla .
2. Adapter Merging in Autoregressive LLMs
While merging LoRA adapters is theoretically possible if tensor shapes and layer names align, my understanding is that, in practice, merging independently trained adapters in autoregressive LLMs rarely produces meaningful or stable results unless the adapters are highly compatible (jointly trained, similar tasks, etc.). Unlike diffusion/image models (where there is some empirical evidence of compositional merging of adapters), LLMs are highly sensitive to even slight adapter incompatibilities. Without careful design, merging adapters is more likely to degrade performance than enhance it. This concern is even for exact same base model, forget cross or differently hyper tuned, just different data set / domain lora merging, as far as i know there's not a known sure method to merge autoregressive LLMs lora that is known to enhance .
3. Benchmarks, Validation, and "It Just Works"
I have not found any functional benchmarks or NLP task-based validation in the repo that demonstrates improvement (or even preservation of performance) after merging multiple LoRA adapters or using cross-architecture adapters. The existing test suite focuses on structural and API-level checks, not on real downstream performance. In my opinion, unless there is empirical evidence—such as regression tests, benchmarks, or even ablation studies showing improvement after merging—these claims should be clearly documented as experimental or unsupported. Without such evidence, “if it fits, it ships” merging is not a reliable strategy.
4. Possibilities in Diffusion Models vs. LLMs
I want to acknowledge that in the diffusion/image model domain, there are evidence that merging adapters can many times work, likely due to the compositional nature of representations. However, even in diffusion, robust and predictable adapter merging is still an active research area and not a solved problem. In autoregressive LLMs, we have yet to see convincing demonstrations of effective, general-purpose LoRA merging. This important distinction should be clearly reflected in the lorax documentation to set realistic expectations for users.
5. Request for Documentation Clarity
Given the above, I strongly suggest clarifying in the documentation:
- That cross-architecture and arbitrary multi-LoRA adapter merging is experimental, with no current evidence of functional improvement in LLMs.
- That users should not expect reliable performance gains unless regression tests or benchmarks are provided for their specific use case.
- That the system will attempt merges if shapes match, but “it works if it works, else it doesn’t”—and there is no guarantee.
I appreciate the ambition and engineering in lorax, but believe it’s vital for the broader community (especially those less familiar with the nuances of LoRA adapters) that these limitations and the current state of research are made explicit in the documentation.
Thank you for your attention to these points, and I look forward to your insights or any evidence you might share.
Best regards,
Sujan Mishra