eliza icon indicating copy to clipboard operation
eliza copied to clipboard

Ollama provider doesn't use correct endpoint

Open deepfates opened this issue 1 year ago • 13 comments

Describe the bug

The Ollama model provider sends a post to localhost:11434/api/chat instead of the correct endpoint, which should be either localhost:11434/api/v1/chat/completions (OpenAI compatible) or localhost:11434/api/generation (Ollama native API).

To Reproduce

Set OLLAMA_MODEL to a local model in .env and "modelProvider": "ollama" in the character file.

Expected behavior

Get a response instead of AI_APICallError

deepfates avatar Dec 07 '24 02:12 deepfates

this is weird, i’ve been using the ollama exclusively for a while now. I know this part of the code and what you are suggesting is interesting about making it more compatible to the openai api.

djaramil avatar Dec 12 '24 16:12 djaramil

It works for you? What endpoint is it calling on the ollama server when it's successful?

deepfates avatar Dec 13 '24 00:12 deepfates

@djaramil aren't you getting infinite response from ollama or localllama?

dr-fusion avatar Dec 18 '24 07:12 dr-fusion

It works for you? What endpoint is it calling on the ollama server when it's successful?

Have you tried setting "modelProvider": "llama_local"? For me the model responses are just slow, even on GPU and sometimes the agent sends answer to old context. After running the bot for a while it gets better and responses become instant.

Etherdrake avatar Dec 19 '24 22:12 Etherdrake

It works for you? What endpoint is it calling on the ollama server when it's successful?

Have you tried setting "modelProvider": "llama_local"? For me the model responses are just slow, even on GPU and sometimes the agent sends answer to old context. After running the bot for a while it gets better and responses become instant.

tried with llama_local as well and it goes in infinite loop with answering older questions etc. even after using it for 2-4 hours. Using it with m3 max 128gb, did a change to handle metal as well.

dr-fusion avatar Dec 20 '24 08:12 dr-fusion

m3 max 128gb

I am not well-versed in MacOS but if I had to guess the GPU is not powerful enough for the model you're using. Try using something like Gemma:2B maybe.

Etherdrake avatar Dec 20 '24 12:12 Etherdrake

It works for you? What endpoint is it calling on the ollama server when it's successful?

Have you tried setting "modelProvider": "llama_local"? For me the model responses are just slow, even on GPU and sometimes the agent sends answer to old context. After running the bot for a while it gets better and responses become instant.

This isn't using Ollama at all, as far as I can tell. I'm glad that it works for you but it's not the same service at all

deepfates avatar Dec 29 '24 00:12 deepfates

Im interested in look into this. Are you all still having issues with the actual ollama ?

AIFlowML avatar Jan 03 '25 04:01 AIFlowML

Not sure if this will help but I had this same issue when I wasn't using the exact model name for OLLAMA_MODEL env var, for example using lama3.2:1b had to be OLLAMA_MODEL="llama3.2:1b" and not OLLAMA_MODEL="llama3.2"

james-ingold avatar Jan 03 '25 20:01 james-ingold

When I try to use my Ollama models, the agent just gives me this empty error message. It keeps trying, but never works:

ℹ INFORMATIONS Generating text with options: {"modelProvider":"ollama","model":"large"}

ℹ INFORMATIONS Selected model: llama3.2:latest

⛔ ERRORS Error in generateText: {}

⛔ ERRORS ERROR: {}

squintdev avatar Jan 03 '25 20:01 squintdev

It works for you? What endpoint is it calling on the ollama server when it's successful?

Have you tried setting "modelProvider": "llama_local"? For me the model responses are just slow, even on GPU and sometimes the agent sends answer to old context. After running the bot for a while it gets better and responses become instant.

This isn't using Ollama at all, as far as I can tell. I'm glad that it works for you but it's not the same service at all

My GPU utilization ran to 100% using it. If it's not the same service, have you found what it is using? Output was extremely slow for me so maybe it started an instance using the integrated graphics on my 13900K while I was running another instance through Msty on my Nvidia GPU.

Etherdrake avatar Jan 04 '25 16:01 Etherdrake

Is possible that this issue is just related to the naming of the models in Ollama. Request feeback.

AIFlowML avatar Jan 06 '25 05:01 AIFlowML

It works for you? What endpoint is it calling on the ollama server when it's successful?

Have you tried setting "modelProvider": "llama_local"? For me the model responses are just slow, even on GPU and sometimes the agent sends answer to old context. After running the bot for a while it gets better and responses become instant.

This isn't using Ollama at all, as far as I can tell. I'm glad that it works for you but it's not the same service at all

My GPU utilization ran to 100% using it. If it's not the same service, have you found what it is using? Output was extremely slow for me so maybe it started an instance using the integrated graphics on my 13900K while I was running another instance through Msty on my Nvidia GPU.

Pretty sure, yeah. That's the LlamaService which uses either a local GGUF model or an Ollama model through the /generate API. What i'm looking at is the handleOllama function which should call the OpenAI-compatible API from a running Ollama server. Not sure why these are different implementations tbh, this repo is kind of hairy

deepfates avatar Jan 06 '25 18:01 deepfates

Is a common error of the local model and related the the naming only.

AIFlowML avatar Jan 12 '25 10:01 AIFlowML

Lol no it's not but okay

deepfates avatar Jan 12 '25 19:01 deepfates

I’m actively on this task including we will develop our own local LLM and rebuild the underlying local execution to also work on Apple Silicon.

On Mon, Jan 13, 2025 at 02:32, deepfates @.***(mailto:On Mon, Jan 13, 2025 at 02:32, deepfates < wrote:

Lol no it's not but okay

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

AIFlowML avatar Jan 13 '25 01:01 AIFlowML