jan icon indicating copy to clipboard operation
jan copied to clipboard

bug: GPU not utilized with manually loaded models

Open TheRealBeef opened this issue 1 year ago • 3 comments

Describe the bug With manually loaded models, e.g (https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF), the GPU is not used and all inference is performed on CPU.

Of course, without a proper CUDA toolkit and Nvidia GPU this can happen, but even with these properly completed (and displaying versions in settings.json) this bug appears to occur

Steps to reproduce Steps to reproduce the behavior: (Using Jan nightly v0.4.3-141)

  1. Download OpenHermes GGUF (or any other GGUF quantized model)
  2. Install it into the models directory under a folder
  3. Run Jan to generate the model.json
  4. Load the model as usual, including ensuring that it is set to use GPU
  5. When loading model, no VRAM is occupied, and when inferencing no GPU is used.

Expected behavior That GPU should be utilized, much like with models downloaded directly

Screenshots If applicable, add screenshots to help explain your issue.

Environment details

  • Operating System: Windows 11
  • Jan Version: Nightly v0.4.3-141
  • Processor: Ryzen 5 7600x
  • RAM: 64GB
  • GPU: Nvidia 3080TI 12GB
  • CUDA Toolkit: 12.3
  • NVIDA Driver: 546.12

TheRealBeef avatar Jan 12 '24 19:01 TheRealBeef

Same with https://huggingface.co/TheBloke/deepseek-coder-6.7B-instruct-GGUF :

  • Operating System: Windows 11
  • Jan Version: Nightly v0.4.3-141
  • Processor: Ryzen 7 7800X3D
  • RAM: 64GB
  • GPU: Nvidia 4080 16GB

Undefined3301 avatar Jan 12 '24 19:01 Undefined3301

image On the left is a model downloaded through Jan, the right is the generated JSON file for a manually imported model.

So it appears that the generated .json overrides the settings in the GUI, so for instance

"ngl": 0,
"embedding": false,

override the GUI settings when you attempt to enable these options.

TheRealBeef avatar Jan 12 '24 20:01 TheRealBeef

Thank you, this is a great find. We'll find a sprint to fix it.

freelerobot avatar Jan 14 '24 14:01 freelerobot

Tested and looking good on Jan v0.4.4-163 ✅ Sample generated model.json for imported model

{
  "object": "model",
  "version": 1,
  "format": "gguf",
  "source_url": "N/A",
  "id": "trinity-v1-7b-q3",
  "name": "trinity-v1-7b-q3",
  "created": 1706001923574,
  "description": "trinity-v1-7b-q3 - user self import model",
  "settings": {
    "ctx_len": 4096,
    "embedding": false,
    "prompt_template": "{system_message}\n### Instruction: {prompt}\n### Response:"
  },
  "parameters": {
    "temperature": 0.7,
    "top_p": 0.95,
    "stream": true,
    "max_tokens": 2048,
    "stop": [
      "<endofstring>"
    ],
    "frequency_penalty": 0,
    "presence_penalty": 0
  },
  "metadata": {
    "size": 3518985920,
    "author": "User",
    "tags": []
  },
  "engine": "nitro"
}

Van-QA avatar Jan 23 '24 09:01 Van-QA