exllama icon indicating copy to clipboard operation
exllama copied to clipboard

Illegal memory access when using a lora

Open sampbarrow opened this issue 11 months ago • 32 comments

Getting this on inference when I have a lora loaded (loading the lora itself doesn't produce any errors).

Using text-generation-webui.

File "/home/user/text-generation-webui/modules/models.py", line 309, in clear_torch_cache torch.cuda.empty_cache() File "/home/user/.local/lib/python3.10/site-packages/torch/cuda/memory.py", line 133, in empty_cache torch._C._cuda_emptyCache() RuntimeError: CUDA error: an illegal memory access was encountered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I just trained this with qlora, unfortunately I can't use the Transformers loader because it takes between 15-45 minutes (not exaggerating, just waited 45 minutes for the last one to load before giving up) to load a Lora and I can't find any reports of the same issue. So I'm trying to load this with exllama on top of a GPTQ version of llama-2-70b. I'm not even sure if that's possible, but previous loras I've trained with other libraries have worked fine on llama 1 gptq.

I don't think I'm out of VRAM, this is failing on a context size of maybe 20 tokens and I'm on an A6000. Single GPU nothing fancy. I can go up to at least 3000 tokens context with transformers, when I am patient enough to wait the half hour or whatever it takes to load. No problems once it loads.

Possibly relevant args from my qlora training:

--lora_r 64 \ --lora_alpha 16 \ --lora_modules all \ --double_quant \ --quant_type nf4 \ --bf16 \ --bits 4 \ --lora_dropout 0.1

My adapter_config.json if it's relevant:

{ "auto_mapping": null, "base_model_name_or_path": "meta-llama/Llama-2-70b-hf", "bias": "none", "fan_in_fan_out": false, "inference_mode": true, "init_lora_weights": true, "layers_pattern": null, "layers_to_transform": null, "lora_alpha": 16.0, "lora_dropout": 0.1, "modules_to_save": null, "peft_type": "LORA", "r": 64, "revision": null, "target_modules": [ "v_proj", "gate_proj", "k_proj", "down_proj", "up_proj", "o_proj", "q_proj" ], "task_type": "CAUSAL_LM"

This is the file structure of the lora I have, not sure if relevant either:

image

sampbarrow avatar Jul 21 '23 23:07 sampbarrow