jan icon indicating copy to clipboard operation
jan copied to clipboard

[WIN11] bug: CUDA unknown error when loading a model

Open GlobalAIVision opened this issue 2 months ago • 1 comments

When launchin a model (each model), i got this error: `2024-04-12T16:15:13.762Z [NITRO]::Debug: [1712938513] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1585][llama_server_context::update_slots] slot 0 released (53 tokens in cache)

2024-04-12T16:15:14.769Z [NITRO]::Debug: 20240412 16:15:13.763000 UTC 13672 INFO reached result stop - llamaCPP.cc:365 20240412 16:15:13.763000 UTC 13672 INFO End of result - llamaCPP.cc:338 20240412 16:15:13.784000 UTC 12180 INFO Task completed, release it - llamaCPP.cc:408 20240412 16:15:14.769000 UTC 13672 INFO sent the non stream, waiting for respone - llamaCPP.cc:416 [1712938514] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1585][llama_server_context::update_slots] slot 0 released (53 tokens in cache) [1712938514] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 882][llama_server_context::launch_slot_with_data] slot 0 is processing [task id: 3]

2024-04-12T16:15:14.769Z [NITRO]::Debug: [1712938514] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1722][llama_server_context::update_slots] slot 0 : kv cache rm - [0, end)

2024-04-12T16:15:18.213Z [NITRO]::Error: CUDA error: unknown error current device: 2, in function ggml_backend_cuda_buffer_cpy_tensor at C:\actions-runner_work\nitro\nitro\llama.cpp\ggml-cuda.cu:10951 cudaMemcpy((char *)dst->data, (const char *)src->data, ggml_nbytes(src), cudaMemcpyDeviceToDevice) GGML_ASSERT: C:\actions-runner_work\nitro\nitro\llama.cpp\ggml-cuda.cu:242: !"CUDA error"

2024-04-12T16:15:23.349Z [NITRO]::Debug: Nitro exited with code: 3221226505 2024-04-12T16:23:35.463Z [NITRO]::Debug: Request to kill Nitro 2024-04-12T16:23:35.470Z [NITRO]::Debug: Nitro process is terminated`

Steps to reproduce Simply download any model on win11. I have 3 X RTX4090. this is my settings.json file: { "notify": true, "run_mode": "gpu", "nvidia_driver": { "exist": true, "version": "552.12" }, "cuda": { "exist": true, "version": "12" }, "gpus": [ { "id": "0", "vram": "24564", "name": "NVIDIA GeForce RTX 4090\r", "arch": "ada" }, { "id": "1", "vram": "24564", "name": "NVIDIA GeForce RTX 4090\r", "arch": "ada" }, { "id": "2", "vram": "24564", "name": "NVIDIA GeForce RTX 4090", "arch": "ada" } ], "gpu_highest_vram": "0", "gpus_in_use": [ "0", "1", "2" ], "is_initial": false, "vulkan": false }

Driver version and cuda version are right. I have cuda 12.1, in the settings file it is written 12. What may be the problem? I'm also trying to load 7B models such as llama, without success.

GlobalAIVision avatar Apr 12 '24 16:04 GlobalAIVision

Adding additional log from users: https://github.com/janhq/jan/issues/1979#issuecomment-2076647120

Van-QA avatar Apr 25 '24 09:04 Van-QA