bartman081523
bartman081523
> This is the output I get from conda list -p "C:\Users\patri\miniconda3\envs\textgen": Your env looks alright, as far as i can tell. > Wait - there is something wrong here....
I found this maybe relevant: https://github.com/huggingface/peft/blob/main/examples/causal_language_modeling/peft_lora_clm_accelerate_big_model_inference.ipynb ``` from peft import PeftModel, PeftConfig max_memory = {0: "1GIB", 1: "1GIB", 2: "2GIB", 3: "10GIB", "cpu": "30GB"} peft_model_id = "smangrul/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM" config = PeftConfig.from_pretrained(peft_model_id)...
I tried without the "--gptq-bits 4", that failed with another error: ``` python server.py --model llama-7b --lora alpaca --listen --gpu-memory 11 --cpu-memory 16 --disk ===================================BUG REPORT=================================== Welcome to bitsandbytes. For...
> Did you manage to find a solution? Yes (but no). I tried to load in 8-bit mode: `python server.py --model llama-7b --lora alpaca --load-in-8bit` In my opinion, this is...
> Did you manage to find a solution? I found a way to load a chat finetuned model, although it is not alpaca, it is still very good. ``` cd...
@wywywywy @BadisG found a way to fix 4-bit mode: https://github.com/oobabooga/text-generation-webui/issues/332#issuecomment-1474883977 change the `lora.py` from the `peft` package: `C:\Users\Utilisateur\anaconda3\envs\textgen\lib\site-packages\peft\tuners\lora.py` in Linux `venv/lib/python3.10/site-packages/peft/tuners/lora.py` fixed `lora.py` https://pastebin.com/eUWZsirk @BadisG added those 2 instructions on...
> Good fix thank you. It worked. And i thank @BadisG > > But I wonder why not everybody faces the same problem? Other people can GPTQ 4bit without modifying...
> maybe this information needs to be in a pull request, as its difficult to find. I agree and the patch is at this time for the peft module, not...
> are you splitting the model in a multi-gpu setup? no.
with `git reset --hard` and `git pull` (update) and the below peft fix, it is now possible to load LoRA models in 4-bit or 8-bit with `--gptq-bits 4` or `--load-in-8bit`...