jan
jan copied to clipboard
feat: Jan supports safetensors
@Van-QA quoted this from feature request janhq/jan#2723:
problem You can only use the GGUF model, not a wide range of models. So, if you can use the Transformer model, you can use most models.
Success Criteria Find the perfect hugging face Transformers model and make it available.
Supports in https://github.com/janhq/jan/pull/1972 for converting huggingface safetensor to gguf and use
Although the issue https://github.com/janhq/jan/issues/2167 is resolved, the Import via Hugging Face is on hold until this epic https://github.com/janhq/cortex/issues/571 is complete.
I have checked the technical possibilities for this.
Please read more in this doc (draft): https://f1da82fe.docs-9ba.pages.dev/guides/glossaries/gguf
Basically there are 2 steps in order to have a single GGUF model:
- Convert Huggingface .safetensor to GGUF BF16 (normally takes around 2mins). This requires the use of convert-hf-to-gguf in python (which can be executed using cortex python runtime). The example command is:
python llama.cpp/convert-hf-to-gguf.py models --outtype bf16 --outfile "${{ env.MODEL_NAME }}/${{ env.bf16 }}" - Once we have GGUF BF16 model, user can choose the quantization they want and run the quantization (around 2 mins). For this it has CPP low level API in quantize.
The example command:
./llama.cpp/quantize "${{ env.MODEL_NAME }}/${{ env.bf16 }}" "${{ env.MODEL_NAME }}/$qtype" "$method"
I think this would help a lot with the adoption of cortex-cli and jan app
related: https://github.com/janhq/cortex/issues/555