llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

invalid configuration argument

Open kingminsvn opened this issue 2 years ago • 4 comments

E:\tools\llama>main.exe -m ....\GPT_MOD\Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin -ngl 32 main: build = 632 (35a8491) main: seed = 1686234538 ggml_init_cublas: found 4 CUDA devices: Device 0: NVIDIA GeForce RTX 2080 Ti Device 1: NVIDIA GeForce RTX 2080 Ti Device 2: NVIDIA GeForce RTX 2080 Ti Device 3: NVIDIA GeForce RTX 2080 Ti llama.cpp: loading model from ....\GPT_MOD\Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 5120 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 40 llama_model_load_internal: n_layer = 40 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 3 (mostly Q4_1) llama_model_load_internal: n_ff = 13824 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 13B llama_model_load_internal: ggml ctx size = 0.09 MB llama_model_load_internal: using CUDA for GPU acceleration ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 2080 Ti) as main device llama_model_load_internal: mem required = 3756.23 MB (+ 1608.00 MB per state) llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer llama_model_load_internal: offloading 32 layers to GPU llama_model_load_internal: total VRAM used: 6564 MB ............................................................................... llama_init_from_file: kv self size = 400.00 MB

system_info: n_threads = 24 / 48 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000 generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0

CUDA error 9 at D:\a\llama.cpp\llama.cpp\ggml-cuda.cu:1574: invalid configuration argument

kingminsvn avatar Jun 07 '23 06:06 kingminsvn

same here

Vencibo avatar Jun 07 '23 10:06 Vencibo

Same here. It seems to happen only when splitting the load across two GPUs. If I use the -ts parameter (described here) to force everything onto one GPU, such as -ts 1,0 or even -ts 0,1, it works. So that's at least a workaround in the meantime, just without multi gpu.

>main -i --interactive-first -r "### Human:" --temp 0 -c 2048 -n -1 --ignore-eos --repeat_penalty 1.2 --instruct -m Wizard-Vicuna-13B-Uncensored.ggmlv3.q8_0.bin --n-gpu-layers 40
main: build = 635 (5c64a09)
main: seed  = 1686175494
ggml_init_cublas: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090
  Device 1: NVIDIA GeForce RTX 3090
llama.cpp: loading model from Wizard-Vicuna-13B-Uncensored.ggmlv3.q8_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 7 (mostly Q8_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 4090) as main device
llama_model_load_internal: mem required  = 2380.14 MB (+ 1608.00 MB per state)
llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 40 layers to GPU
llama_model_load_internal: total VRAM used: 13370 MB
...................................................................................................
llama_init_from_file: kv self size  = 1600.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Human:'
Reverse prompt: '### Instruction:

'
sampling: repeat_last_n = 64, repeat_penalty = 1.200000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.000000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 2


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 CUDA error 9 at D:\a\llama.cpp\llama.cpp\ggml-cuda.cu:1574: invalid configuration argument

ThioJoe avatar Jun 07 '23 21:06 ThioJoe

same here on main.exe and server

goyanx avatar Jun 07 '23 22:06 goyanx

This issue seems to only occur on Windows systems with multiple graphics cards.

kingminsvn avatar Jun 08 '23 03:06 kingminsvn

Still happening on latest build 0bf7cf1

ThioJoe avatar Jun 08 '23 22:06 ThioJoe

Seems to be fixed at least as of 303f580

ThioJoe avatar Jun 10 '23 17:06 ThioJoe

Getting this error on Linux after compiling with cublas

dillfrescott avatar Oct 25 '23 14:10 dillfrescott

Same with https://huggingface.co/TheBloke/CausalLM-14B-GGUF

JoseConseco avatar Oct 26 '23 13:10 JoseConseco

@JoseConseco funny enough it was that exact same model too

dillfrescott avatar Oct 26 '23 18:10 dillfrescott

yes, this is problem with the model. not with llama. so this is not related to issue in current thread.

JoseConseco avatar Oct 26 '23 19:10 JoseConseco