Angainor Development
Angainor Development
I just updated to git+https://github.com/kooshi/transformers.git@balanced_memory_8bit > how did you force max_memory? I edited finetune.py line 78 to be I used `max_memory={0: "15GB", 1: "15GB"},` This seems to have no effect,...
Yeah, But torch.cuda.device_count() correctly detects the 2 GPUs. `CUDA_VISIBLE_DEVICES` was not set, I explicitely set it to `CUDA_VISIBLE_DEVICES=0,1` , no change. Second one gets a bit of vram used when...
Thanks for the follow up. Agree, something could be broken in my setup, I'll do from a clean one next time I'll try, thanks!
See this other - same - issue and answers https://github.com/tloen/alpaca-lora/issues/8#issuecomment-1477490259 Training on multiple GPUs is possible with torchrun, you'll double batch size and half the training time. Take care of...
Just to clarify, because I read several questions around this and I'd like to understand the rational behind it: Before this commit https://github.com/tloen/alpaca-lora/commit/b12c3b90f808e7d62709aad104d4fac1fbc880eb the prompt was masked in the labels....
Oh, ok! thanks!
Great suggestion! I'd like this to be an extra param, or be auto computed from number of cores / number of gpus. When running with DDP for instance, we don't...
Yep. I'll propose a deeper check in a future PR, as all these params have to match and be consistent with each other and training dataset size. For instance, in...
> What do you think of an alternative fix where: Yep, what I had in mind was a consistency check of the entangled params, warning and auto fix if possible....
Why don't we set adapter_name from now on to avoid tweaking lib code? Defaut name from current peft lib is `adapter_name="default"` just adding this at training end in `get_peft_model_state_dict(` would...