frob comments

Results 806 comments of


                                            frob

How can I run the text embedding API offline?

Do you mean how do you run `shaw/dmeta-embedding-zh` if the server is not connected to the internet? You download the model and copy it to the [model directory](https://github.com/ollama/ollama/blob/main/docs/faq.md#where-are-models-stored) on the...

How can I run the text embedding API offline?

shaw/dmeta-embedding-zh is an embedding model, you can't use `run` to load it. Your client needs to send an API call: ```console $ curl localhost:11434/api/embed -d '{"model":"shaw/dmeta-embedding-zh","input":"make an embedding"}' {"model":"shaw/dmeta-embedding-zh","embeddings":[[0.0023328515,-0.002045153 ......

How can I run the text embedding API offline?

You need to copy the model to your server. What does the following show: ``` ollama list ```

How can I run the text embedding API offline?

> Im quite sure the model is in the list.Every time I need to enter the model name, I copy the result from the list command. :) Then show the...

How can I run the text embedding API offline?

Your client needs to make an API call with the name of the model in the model field.

How can I run the text embedding API offline?

The model is loaded when the client makes an API call with the name of the model in the `model` field.

Error: invalid file magic when trying to import gte-Qwen2-7B-instruct gguf model to ollama instance

What commands did you run to quantize the model?

Error: invalid file magic when trying to import gte-Qwen2-7B-instruct gguf model to ollama instance

I can confirm that the quantized model fails to load. I downloaded the model from [Alibaba-NLP/gte-Qwen2-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct), converted to GGUF with ghcr.io/ggerganov/llama.cpp:full-cuda--b1-de28008 and then quantized to Q4_K_M. When trying to create...

Error: invalid file magic when trying to import gte-Qwen2-7B-instruct gguf model to ollama instance

The problem here is that the llama.cpp quantizer pads the output with null bytes until it's a multiple of 32 bytes long. The llama.cpp inference engine doesn't care about the...

Error: invalid file magic when trying to import gte-Qwen2-7B-instruct gguf model to ollama instance

Fixed via #10722