jan
jan copied to clipboard
bug: GPU not utilized with manually loaded models
Describe the bug With manually loaded models, e.g (https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF), the GPU is not used and all inference is performed on CPU.
Of course, without a proper CUDA toolkit and Nvidia GPU this can happen, but even with these properly completed (and displaying versions in settings.json) this bug appears to occur
Steps to reproduce Steps to reproduce the behavior: (Using Jan nightly v0.4.3-141)
- Download OpenHermes GGUF (or any other GGUF quantized model)
- Install it into the models directory under a folder
- Run Jan to generate the model.json
- Load the model as usual, including ensuring that it is set to use GPU
- When loading model, no VRAM is occupied, and when inferencing no GPU is used.
Expected behavior That GPU should be utilized, much like with models downloaded directly
Screenshots If applicable, add screenshots to help explain your issue.
Environment details
- Operating System: Windows 11
- Jan Version: Nightly v0.4.3-141
- Processor: Ryzen 5 7600x
- RAM: 64GB
- GPU: Nvidia 3080TI 12GB
- CUDA Toolkit: 12.3
- NVIDA Driver: 546.12
Same with https://huggingface.co/TheBloke/deepseek-coder-6.7B-instruct-GGUF :
- Operating System: Windows 11
- Jan Version: Nightly v0.4.3-141
- Processor: Ryzen 7 7800X3D
- RAM: 64GB
- GPU: Nvidia 4080 16GB
On the left is a model downloaded through Jan, the right is the generated JSON file for a manually imported model.
So it appears that the generated .json overrides the settings in the GUI, so for instance
"ngl": 0,
"embedding": false,
override the GUI settings when you attempt to enable these options.
Thank you, this is a great find. We'll find a sprint to fix it.
Tested and looking good on Jan v0.4.4-163 ✅ Sample generated model.json for imported model
{
"object": "model",
"version": 1,
"format": "gguf",
"source_url": "N/A",
"id": "trinity-v1-7b-q3",
"name": "trinity-v1-7b-q3",
"created": 1706001923574,
"description": "trinity-v1-7b-q3 - user self import model",
"settings": {
"ctx_len": 4096,
"embedding": false,
"prompt_template": "{system_message}\n### Instruction: {prompt}\n### Response:"
},
"parameters": {
"temperature": 0.7,
"top_p": 0.95,
"stream": true,
"max_tokens": 2048,
"stop": [
"<endofstring>"
],
"frequency_penalty": 0,
"presence_penalty": 0
},
"metadata": {
"size": 3518985920,
"author": "User",
"tags": []
},
"engine": "nitro"
}