jan bug: GPU not utilized with manually loaded models

Describe the bug With manually loaded models, e.g (https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF), the GPU is not used and all inference is performed on CPU.

Of course, without a proper CUDA toolkit and Nvidia GPU this can happen, but even with these properly completed (and displaying versions in settings.json) this bug appears to occur

Steps to reproduce Steps to reproduce the behavior: (Using Jan nightly v0.4.3-141)

Download OpenHermes GGUF (or any other GGUF quantized model)
Install it into the models directory under a folder
Run Jan to generate the model.json
Load the model as usual, including ensuring that it is set to use GPU
When loading model, no VRAM is occupied, and when inferencing no GPU is used.

Expected behavior That GPU should be utilized, much like with models downloaded directly

Screenshots If applicable, add screenshots to help explain your issue.

Environment details

Operating System: Windows 11
Jan Version: Nightly v0.4.3-141
Processor: Ryzen 5 7600x
RAM: 64GB
GPU: Nvidia 3080TI 12GB
CUDA Toolkit: 12.3
NVIDA Driver: 546.12

Jan 12 '24 19:01 TheRealBeef

Same with https://huggingface.co/TheBloke/deepseek-coder-6.7B-instruct-GGUF :

Operating System: Windows 11
Jan Version: Nightly v0.4.3-141
Processor: Ryzen 7 7800X3D
RAM: 64GB
GPU: Nvidia 4080 16GB

Jan 12 '24 19:01 Undefined3301

On the left is a model downloaded through Jan, the right is the generated JSON file for a manually imported model.

So it appears that the generated .json overrides the settings in the GUI, so for instance

"ngl": 0,
"embedding": false,

override the GUI settings when you attempt to enable these options.

Jan 12 '24 20:01 TheRealBeef

Thank you, this is a great find. We'll find a sprint to fix it.

Jan 14 '24 14:01 freelerobot

Tested and looking good on Jan v0.4.4-163 ✅ Sample generated model.json for imported model

{
  "object": "model",
  "version": 1,
  "format": "gguf",
  "source_url": "N/A",
  "id": "trinity-v1-7b-q3",
  "name": "trinity-v1-7b-q3",
  "created": 1706001923574,
  "description": "trinity-v1-7b-q3 - user self import model",
  "settings": {
    "ctx_len": 4096,
    "embedding": false,
    "prompt_template": "{system_message}\n### Instruction: {prompt}\n### Response:"
  },
  "parameters": {
    "temperature": 0.7,
    "top_p": 0.95,
    "stream": true,
    "max_tokens": 2048,
    "stop": [
      "<endofstring>"
    ],
    "frequency_penalty": 0,
    "presence_penalty": 0
  },
  "metadata": {
    "size": 3518985920,
    "author": "User",
    "tags": []
  },
  "engine": "nitro"
}

Jan 23 '24 09:01 Van-QA

jan jan copied to clipboard

bug: GPU not utilized with manually loaded models

jan
jan copied to clipboard