frob
frob
Do you mean how do you run `shaw/dmeta-embedding-zh` if the server is not connected to the internet? You download the model and copy it to the [model directory](https://github.com/ollama/ollama/blob/main/docs/faq.md#where-are-models-stored) on the...
shaw/dmeta-embedding-zh is an embedding model, you can't use `run` to load it. Your client needs to send an API call: ```console $ curl localhost:11434/api/embed -d '{"model":"shaw/dmeta-embedding-zh","input":"make an embedding"}' {"model":"shaw/dmeta-embedding-zh","embeddings":[[0.0023328515,-0.002045153 ......
You need to copy the model to your server. What does the following show: ``` ollama list ```
> Im quite sure the model is in the list.Every time I need to enter the model name, I copy the result from the list command. :) Then show the...
Your client needs to make an API call with the name of the model in the model field.
The model is loaded when the client makes an API call with the name of the model in the `model` field.
What commands did you run to quantize the model?
I can confirm that the quantized model fails to load. I downloaded the model from [Alibaba-NLP/gte-Qwen2-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct), converted to GGUF with ghcr.io/ggerganov/llama.cpp:full-cuda--b1-de28008 and then quantized to Q4_K_M. When trying to create...
The problem here is that the llama.cpp quantizer pads the output with null bytes until it's a multiple of 32 bytes long. The llama.cpp inference engine doesn't care about the...