exllama
exllama copied to clipboard
Illegal memory access when using a lora
Getting this on inference when I have a lora loaded (loading the lora itself doesn't produce any errors).
Using text-generation-webui.
File "/home/user/text-generation-webui/modules/models.py", line 309, in clear_torch_cache torch.cuda.empty_cache() File "/home/user/.local/lib/python3.10/site-packages/torch/cuda/memory.py", line 133, in empty_cache torch._C._cuda_emptyCache() RuntimeError: CUDA error: an illegal memory access was encountered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
I just trained this with qlora, unfortunately I can't use the Transformers loader because it takes between 15-45 minutes (not exaggerating, just waited 45 minutes for the last one to load before giving up) to load a Lora and I can't find any reports of the same issue. So I'm trying to load this with exllama on top of a GPTQ version of llama-2-70b. I'm not even sure if that's possible, but previous loras I've trained with other libraries have worked fine on llama 1 gptq.
I don't think I'm out of VRAM, this is failing on a context size of maybe 20 tokens and I'm on an A6000. Single GPU nothing fancy. I can go up to at least 3000 tokens context with transformers, when I am patient enough to wait the half hour or whatever it takes to load. No problems once it loads.
Possibly relevant args from my qlora training:
--lora_r 64 \ --lora_alpha 16 \ --lora_modules all \ --double_quant \ --quant_type nf4 \ --bf16 \ --bits 4 \ --lora_dropout 0.1
My adapter_config.json if it's relevant:
{ "auto_mapping": null, "base_model_name_or_path": "meta-llama/Llama-2-70b-hf", "bias": "none", "fan_in_fan_out": false, "inference_mode": true, "init_lora_weights": true, "layers_pattern": null, "layers_to_transform": null, "lora_alpha": 16.0, "lora_dropout": 0.1, "modules_to_save": null, "peft_type": "LORA", "r": 64, "revision": null, "target_modules": [ "v_proj", "gate_proj", "k_proj", "down_proj", "up_proj", "o_proj", "q_proj" ], "task_type": "CAUSAL_LM"
This is the file structure of the lora I have, not sure if relevant either: