jan docs: llama.cpp/GGUF CPU offloading no longer present?

Pages

https://jan.ai/docs/built-in/llama-cpp

Success Criteria

My install of jan.ai doesn't have a ~/jan/engines/nitro.json at all. It only has groq.json and openai.json.
I have run multiple local GGUF models on this install already. If it was to be self-generated, it should already exist.
Looking into the model.json, it does not have the ngl: 100 line at all.

Additional context Is this feature still in jan.ai? I am trying to run models bigger than my GPU's VRAM limits.

May 02 '24 12:05 mr-september

hi @mr-september,

For model.json, here is how to make modifications to the value of the setting to include ngl:
For nitro.json, the engines folder no longer exists, we refactored it in the Jan app. and will make modifications to docs shortly

cc: @cahyosubroto @aindrajaya @irfanpena to update 2 points mentioned above into our docs. Note that for No.2, we will need correction for nitron.json or engines folder in everyplace in our docs, as it no longer exists.

May 03 '24 04:05 Van-QA

@Van-QA Can the parameters in the https://jan.ai/docs/built-in/llama-cpp for nitro.json be used for the settings parameters in model.json?

May 03 '24 07:05 irfanpena

hi @irfanpena, all 5 parameters here can be applied to model.json:

  "ctx_len": 2048, 
  "ngl": 100,
  "cpu_threads": 1,
  "cont_batching": false,
  "embedding": false

where

"ctx_len": 2048, 
  "ngl": 100,

are more important (more impact) than the rest Thank you

May 03 '24 07:05 Van-QA

Thanks @Van-QA, that seems to be working on my end. Feel free to close this issue any time the team deems the docs updated.

If I may suggest, could this be added into the GUI as well? Ideally with some kind of general estimation (e.g. jan.ai detects my system has 8GB VRAM and 32GB RAM, and the model size is 12GB - suggest default 50% offload, etc.)

May 04 '24 08:05 mr-september

Linking the issue to https://github.com/janhq/jan/issues/2208, related to RAM/VRAM utilization.

Thanks @Van-QA, that seems to be working on my end. Feel free to close this issue any time the team deems the docs updated.

If I may suggest, could this be added into the GUI as well? Ideally with some kind of general estimation (e.g. jan.ai detects my system has 8GB VRAM and 32GB RAM, and the model size is 12GB - suggest default 50% offload, etc.) https://github.com/janhq/jan/issues/2859#issuecomment-2094081223

May 07 '24 09:05 Van-QA

jan jan copied to clipboard

docs: llama.cpp/GGUF CPU offloading no longer present?

jan
jan copied to clipboard