jan bug: GPU accelerated model fails to load without a visible error message

bug: GPU accelerated model fails to load without a visible error message

Open sgdesmet opened this issue 4 months ago • 1 comments

Jan version

0.5.5

Describe the Bug

I'm running Fedora 40 on a laptop with a GTX 1050 Ti with 4Gb RAM. When enabling GPU acceleration and attempting to run models that are marked as 'Slow on your device' (such as Llama 3.2 3B Instruct Q8) , they fail to start without any visible error message. At first glance, the logs show what appears to be a memory issue:

2024-10-03T09:04:05.888Z [CORTEX]::Error: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 1026.00 MiB on device 0: cudaMalloc failed: out of memory

Is it correct that my device is unable to run this particular model? If so, a 'Not enough VRAM' indicator when downloading the model and an explicit error message when starting the model would be my expectation.

Steps to Reproduce

Install additional Cortex.cpp dependencies
Enable GPU acceleration
Download model Llama 3.2 3B Instruct Q8
Start a new thread and enter some text
Observe 'Starting model' loading indicator
Nothing happens

Screenshots / Logs

app.log

What is your OS?

[ ] MacOS
[ ] Windows
[X] Linux

Oct 03 '24 09:10 sgdesmet

jan jan copied to clipboard

bug: GPU accelerated model fails to load without a visible error message

Jan version

Describe the Bug

Steps to Reproduce

Screenshots / Logs

What is your OS?

jan
jan copied to clipboard