jan feat: Jan supports safetensors

feat: Jan supports safetensors

Open freelerobot opened this issue 1 year ago • 3 comments

@Van-QA quoted this from feature request janhq/jan#2723:

problem You can only use the GGUF model, not a wide range of models. So, if you can use the Transformer model, you can use most models.

Success Criteria Find the perfect hugging face Transformers model and make it available.

Dec 18 '23 04:12 freelerobot

Supports in https://github.com/janhq/jan/pull/1972 for converting huggingface safetensor to gguf and use

Feb 11 '24 05:02 hiro-v

Although the issue https://github.com/janhq/jan/issues/2167 is resolved, the Import via Hugging Face is on hold until this epic https://github.com/janhq/cortex/issues/571 is complete.

Mar 06 '24 15:03 Van-QA

I have checked the technical possibilities for this.

Please read more in this doc (draft): https://f1da82fe.docs-9ba.pages.dev/guides/glossaries/gguf

Basically there are 2 steps in order to have a single GGUF model:

Convert Huggingface .safetensor to GGUF BF16 (normally takes around 2mins). This requires the use of convert-hf-to-gguf in python (which can be executed using cortex python runtime). The example command is: python llama.cpp/convert-hf-to-gguf.py models --outtype bf16 --outfile "${{ env.MODEL_NAME }}/${{ env.bf16 }}"
Once we have GGUF BF16 model, user can choose the quantization they want and run the quantization (around 2 mins). For this it has CPP low level API in quantize. The example command: ./llama.cpp/quantize "${{ env.MODEL_NAME }}/${{ env.bf16 }}" "${{ env.MODEL_NAME }}/$qtype" "$method"

I think this would help a lot with the adoption of cortex-cli and jan app

May 17 '24 09:05 hiro-v

related: https://github.com/janhq/cortex/issues/555

Jun 11 '24 01:06 freelerobot

jan jan copied to clipboard

feat: Jan supports safetensors

jan
jan copied to clipboard