generic-username0718
generic-username0718
hell, just load the LoRA on your phone and refresh the page... it bugs out...
> > Did you manage to find a solution? > > Yes (but no). I tried to load in 8-bit mode: `python server.py --model llama-7b --lora alpaca --load-in-8bit` > >...
I think I'm running into this bug https://github.com/huggingface/peft/issues/115#issuecomment-1460706852 Looks like I may need to modify PeftModel.from_pretrained or PeftModelForCausalLM but I'm not sure where...
> For me/us, this fixed 8bit and 4bit with LoRA mode: [#332 (comment)](https://github.com/oobabooga/text-generation-webui/issues/332#issuecomment-1474883977) are you splitting the model in a multi-gpu setup?
Yeah I think that's my problem... Looks like this guy may have done it... something about autocast? https://github.com/huggingface/peft/issues/115#issuecomment-1441016348 `with torch.cuda.amp.autocast():` `outputs = model.generate(input_ids=inputs['input_ids'], max_new_tokens=10)`
Is there something I need to do to support multi-gpu configuration lora? 
Awesome stuff. I'm able to load LLaMA-7b but trying to load LLaMA-13b crashes with the error: ``` Traceback (most recent call last): File "/home/user/Documents/oobabooga/text-generation-webui/server.py", line 189, in shared.model, shared.tokenizer =...
Anyone reading this you can get past the issue above by changing the world_size variable found in modules/LLaMA.py like this: def setup_model_parallel() -> Tuple[int, int]: local_rank = int(os.environ.get("LOCAL_RANK", -1)) **world_size...
Is there a parameter I need to pass to oobabooga to tell it to split the model among my two 3090 gpus?