jan
jan copied to clipboard
bug: GPU accelerated model fails to load without a visible error message
Jan version
0.5.5
Describe the Bug
I'm running Fedora 40 on a laptop with a GTX 1050 Ti with 4Gb RAM. When enabling GPU acceleration and attempting to run models that are marked as 'Slow on your device' (such as Llama 3.2 3B Instruct Q8
) , they fail to start without any visible error message. At first glance, the logs show what appears to be a memory issue:
2024-10-03T09:04:05.888Z [CORTEX]::Error: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 1026.00 MiB on device 0: cudaMalloc failed: out of memory
Is it correct that my device is unable to run this particular model? If so, a 'Not enough VRAM' indicator when downloading the model and an explicit error message when starting the model would be my expectation.
Steps to Reproduce
- Install additional Cortex.cpp dependencies
- Enable GPU acceleration
- Download model Llama 3.2 3B Instruct Q8
- Start a new thread and enter some text
- Observe 'Starting model' loading indicator
- Nothing happens
Screenshots / Logs
What is your OS?
- [ ] MacOS
- [ ] Windows
- [X] Linux