Benjamin Bossan
Benjamin Bossan
@phemw The PR is ready from my side, if you want to give this a try, LMK what you find. Note that the memory overhead of caching is quite significant...
> it looks like peft is trying to use the config object as a dict, but it is not a dict. is there a specific version of peft that needs...
Thanks for the report. If possible, could you please try out one thing: Go into the PEFT source code and change the method like so: ```python def get_param(self): param =...
@qiu-xiao-0330 I tried to reproduce the issue but was unsuccessful so far. Could you please tell me what model you're trying to train?
Thanks for the info @qiu-xiao-0330. Unfortunately, that model is too big for me to train (same with 30B), so I tried a smaller variant of the model, `"yujiepan/qwen3-vl-moe-tiny-random"`. For the...
> 1.When I used the 30B model, I found that tensor parallelization was not enabled. We haven't checked if/how PEFT targeting layers with tensor parallelism works. Could you ideally share...
_not stale_
Thanks @NouamaneELGueddarii. Check out the quantization support for LoRA, e.g. for [bitsandbytes](https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/bnb.py). Also, feel free to open an early draft PR in case you encounter any roadblocks.
Thanks for the suggestion. So from my understanding, if we wanted to use that functionality, it would involve a complete rewrite of the corresponding adapters. I think this is a...
What I could envision is to have a separate implementation with a different name. With a separate implementation, we can ensure: - existing code not breaking - not requiring full...