Patrick Devine
Patrick Devine
@LumiWasTaken can you post the logs?
@utility-aagrawal What kind of GPU are you using? Is the model offloaded entirely onto the GPU or only partially?
It looks like `convert.py` is creating a gguf which doesn't have the tokens KV. In `llm.GraphSize()` we're panicking when trying to get the vocab. > vocab := uint64(len(llm.KV()["tokenizer.ggml.tokens"].([]any))) @laik the...
@rebas3 you can follow the directions that @centopw posted. You'll need to: ``` launchctl setenv OLLAMA_HOST "0.0.0.0" ``` and then restart ollama. Launchctl environment variables don't persist between reboots though....
Hey @David20080125, usually this happens because the GGUF file is corrupt, or you didn't actually download the file. Can you verify the checksum on the file is correct?
@Nimmalapudi-Pratyusha I think you're just specifying the wrong file path for the file.
@Nimmalapudi-Pratyusha did you manage to get this to work?
Glad you got it working!
Are you using the CLI or the API? Do all subsequent requests continue to be slow, or just the first time you do it after 30 minutes?
@353167931 Can you double check: 1. the system clock is set correctly on the machine; and 2. that you're up-to-date with system updates I *think* this is an error with...