jan
jan copied to clipboard
docs: llama.cpp/GGUF CPU offloading no longer present?
Pages
- https://jan.ai/docs/built-in/llama-cpp
Success Criteria
- My install of jan.ai doesn't have a
~/jan/engines/nitro.jsonat all. It only hasgroq.jsonandopenai.json. - I have run multiple local GGUF models on this install already. If it was to be self-generated, it should already exist.
- Looking into the model.json, it does not have the
ngl: 100line at all.
Additional context Is this feature still in jan.ai? I am trying to run models bigger than my GPU's VRAM limits.
hi @mr-september,
-
For model.json, here is how to make modifications to the value of the setting to include ngl:
-
For nitro.json, the engines folder no longer exists, we refactored it in the Jan app. and will make modifications to docs shortly
cc: @cahyosubroto @aindrajaya @irfanpena to update 2 points mentioned above into our docs. Note that for No.2, we will need correction for nitron.json or engines folder in everyplace in our docs, as it no longer exists.
@Van-QA Can the parameters in the https://jan.ai/docs/built-in/llama-cpp for nitro.json be used for the settings parameters in model.json?
hi @irfanpena, all 5 parameters here can be applied to model.json:
"ctx_len": 2048,
"ngl": 100,
"cpu_threads": 1,
"cont_batching": false,
"embedding": false
where
"ctx_len": 2048,
"ngl": 100,
are more important (more impact) than the rest Thank you
Thanks @Van-QA, that seems to be working on my end. Feel free to close this issue any time the team deems the docs updated.
If I may suggest, could this be added into the GUI as well? Ideally with some kind of general estimation (e.g. jan.ai detects my system has 8GB VRAM and 32GB RAM, and the model size is 12GB - suggest default 50% offload, etc.)
Linking the issue to https://github.com/janhq/jan/issues/2208, related to RAM/VRAM utilization.
Thanks @Van-QA, that seems to be working on my end. Feel free to close this issue any time the team deems the docs updated.
If I may suggest, could this be added into the GUI as well? Ideally with some kind of general estimation (e.g. jan.ai detects my system has 8GB VRAM and 32GB RAM, and the model size is 12GB - suggest default 50% offload, etc.) https://github.com/janhq/jan/issues/2859#issuecomment-2094081223