eliza Ollama provider doesn't use correct endpoint

Describe the bug

The Ollama model provider sends a post to localhost:11434/api/chat instead of the correct endpoint, which should be either localhost:11434/api/v1/chat/completions (OpenAI compatible) or localhost:11434/api/generation (Ollama native API).

To Reproduce

Set OLLAMA_MODEL to a local model in .env and "modelProvider": "ollama" in the character file.

Expected behavior

Get a response instead of AI_APICallError

Dec 07 '24 02:12 deepfates

this is weird, i’ve been using the ollama exclusively for a while now. I know this part of the code and what you are suggesting is interesting about making it more compatible to the openai api.

Dec 12 '24 16:12 djaramil

It works for you? What endpoint is it calling on the ollama server when it's successful?

Dec 13 '24 00:12 deepfates

@djaramil aren't you getting infinite response from ollama or localllama?

Dec 18 '24 07:12 dr-fusion

It works for you? What endpoint is it calling on the ollama server when it's successful?

Have you tried setting "modelProvider": "llama_local"? For me the model responses are just slow, even on GPU and sometimes the agent sends answer to old context. After running the bot for a while it gets better and responses become instant.

Dec 19 '24 22:12 Etherdrake

It works for you? What endpoint is it calling on the ollama server when it's successful?

Have you tried setting "modelProvider": "llama_local"? For me the model responses are just slow, even on GPU and sometimes the agent sends answer to old context. After running the bot for a while it gets better and responses become instant.

tried with llama_local as well and it goes in infinite loop with answering older questions etc. even after using it for 2-4 hours. Using it with m3 max 128gb, did a change to handle metal as well.

Dec 20 '24 08:12 dr-fusion

m3 max 128gb

I am not well-versed in MacOS but if I had to guess the GPU is not powerful enough for the model you're using. Try using something like Gemma:2B maybe.

Dec 20 '24 12:12 Etherdrake

It works for you? What endpoint is it calling on the ollama server when it's successful?

Have you tried setting "modelProvider": "llama_local"? For me the model responses are just slow, even on GPU and sometimes the agent sends answer to old context. After running the bot for a while it gets better and responses become instant.

This isn't using Ollama at all, as far as I can tell. I'm glad that it works for you but it's not the same service at all

Dec 29 '24 00:12 deepfates

Im interested in look into this. Are you all still having issues with the actual ollama ?

Jan 03 '25 04:01 AIFlowML

Not sure if this will help but I had this same issue when I wasn't using the exact model name for OLLAMA_MODEL env var, for example using lama3.2:1b had to be OLLAMA_MODEL="llama3.2:1b" and not OLLAMA_MODEL="llama3.2"

Jan 03 '25 20:01 james-ingold

When I try to use my Ollama models, the agent just gives me this empty error message. It keeps trying, but never works:

ℹ INFORMATIONS Generating text with options: {"modelProvider":"ollama","model":"large"}

ℹ INFORMATIONS Selected model: llama3.2:latest

⛔ ERRORS Error in generateText: {}

⛔ ERRORS ERROR: {}

Jan 03 '25 20:01 squintdev

It works for you? What endpoint is it calling on the ollama server when it's successful?

Have you tried setting "modelProvider": "llama_local"? For me the model responses are just slow, even on GPU and sometimes the agent sends answer to old context. After running the bot for a while it gets better and responses become instant.

This isn't using Ollama at all, as far as I can tell. I'm glad that it works for you but it's not the same service at all

My GPU utilization ran to 100% using it. If it's not the same service, have you found what it is using? Output was extremely slow for me so maybe it started an instance using the integrated graphics on my 13900K while I was running another instance through Msty on my Nvidia GPU.

Jan 04 '25 16:01 Etherdrake

Is possible that this issue is just related to the naming of the models in Ollama. Request feeback.

Jan 06 '25 05:01 AIFlowML

It works for you? What endpoint is it calling on the ollama server when it's successful?

Have you tried setting "modelProvider": "llama_local"? For me the model responses are just slow, even on GPU and sometimes the agent sends answer to old context. After running the bot for a while it gets better and responses become instant.

This isn't using Ollama at all, as far as I can tell. I'm glad that it works for you but it's not the same service at all

My GPU utilization ran to 100% using it. If it's not the same service, have you found what it is using? Output was extremely slow for me so maybe it started an instance using the integrated graphics on my 13900K while I was running another instance through Msty on my Nvidia GPU.

Pretty sure, yeah. That's the LlamaService which uses either a local GGUF model or an Ollama model through the /generate API. What i'm looking at is the handleOllama function which should call the OpenAI-compatible API from a running Ollama server. Not sure why these are different implementations tbh, this repo is kind of hairy

Jan 06 '25 18:01 deepfates

Is a common error of the local model and related the the naming only.

Jan 12 '25 10:01 AIFlowML

Lol no it's not but okay

Jan 12 '25 19:01 deepfates

I’m actively on this task including we will develop our own local LLM and rebuild the underlying local execution to also work on Apple Silicon.

On Mon, Jan 13, 2025 at 02:32, deepfates @.***(mailto:On Mon, Jan 13, 2025 at 02:32, deepfates < wrote:

Lol no it's not but okay

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

Jan 13 '25 01:01 AIFlowML