jan icon indicating copy to clipboard operation
jan copied to clipboard

bug: GPU accelerated model fails to load without a visible error message

Open sgdesmet opened this issue 4 months ago • 1 comments

Jan version

0.5.5

Describe the Bug

I'm running Fedora 40 on a laptop with a GTX 1050 Ti with 4Gb RAM. When enabling GPU acceleration and attempting to run models that are marked as 'Slow on your device' (such as Llama 3.2 3B Instruct Q8) , they fail to start without any visible error message. At first glance, the logs show what appears to be a memory issue:

2024-10-03T09:04:05.888Z [CORTEX]::Error: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 1026.00 MiB on device 0: cudaMalloc failed: out of memory

Is it correct that my device is unable to run this particular model? If so, a 'Not enough VRAM' indicator when downloading the model and an explicit error message when starting the model would be my expectation.

Steps to Reproduce

  1. Install additional Cortex.cpp dependencies
  2. Enable GPU acceleration
  3. Download model Llama 3.2 3B Instruct Q8
  4. Start a new thread and enter some text
  5. Observe 'Starting model' loading indicator
  6. Nothing happens

Screenshots / Logs

app.log

What is your OS?

  • [ ] MacOS
  • [ ] Windows
  • [X] Linux

sgdesmet avatar Oct 03 '24 09:10 sgdesmet