generic-username0718

Results 16 comments of generic-username0718

hell, just load the LoRA on your phone and refresh the page... it bugs out...

I think I'm running into this bug https://github.com/huggingface/peft/issues/115#issuecomment-1460706852 Looks like I may need to modify PeftModel.from_pretrained or PeftModelForCausalLM but I'm not sure where...

> For me/us, this fixed 8bit and 4bit with LoRA mode: [#332 (comment)](https://github.com/oobabooga/text-generation-webui/issues/332#issuecomment-1474883977) are you splitting the model in a multi-gpu setup?

Yeah I think that's my problem... Looks like this guy may have done it... something about autocast? https://github.com/huggingface/peft/issues/115#issuecomment-1441016348 `with torch.cuda.amp.autocast():` `outputs = model.generate(input_ids=inputs['input_ids'], max_new_tokens=10)`

Is there something I need to do to support multi-gpu configuration lora? ![image](https://user-images.githubusercontent.com/126929561/226198854-8d88c304-1a60-434a-bfe4-f54c190e3a23.png)

Awesome stuff. I'm able to load LLaMA-7b but trying to load LLaMA-13b crashes with the error: ``` Traceback (most recent call last): File "/home/user/Documents/oobabooga/text-generation-webui/server.py", line 189, in shared.model, shared.tokenizer =...

Anyone reading this you can get past the issue above by changing the world_size variable found in modules/LLaMA.py like this: def setup_model_parallel() -> Tuple[int, int]: local_rank = int(os.environ.get("LOCAL_RANK", -1)) **world_size...

Is there a parameter I need to pass to oobabooga to tell it to split the model among my two 3090 gpus?