AppFlowy-Cloud icon indicating copy to clipboard operation
AppFlowy-Cloud copied to clipboard

[FR] Allow custom OpenAi api base url

Open TigerBeanst opened this issue 1 year ago • 15 comments

1~3 main use cases of the proposed feature For now, it looks like the ai container only requests to OpenAI by api.openai.com. It would be great if the base url could be customized.

TigerBeanst avatar Oct 28 '24 09:10 TigerBeanst

We are using langchain python package to build clients for OpenAI, so it might be possible to use OPENAI_BASE_URL to switch the url. Will need time to experiment if this will work properly. The biggest issue, is whether the current compatibility mode available (example: one that is provided by Ollama) can actually accept the current request payload from AppFlowy AI service without any changes at all. If it is only a proxy server that relay the request to OpenAI, then this will probably work.

Reference for the latest API doc: https://python.langchain.com/api_reference/_modules/langchain_openai/chat_models/base.html#BaseChatOpenAI

Specifically, this line:

self.openai_api_base = self.openai_api_base or os.getenv("OPENAI_API_BASE")

khorshuheng avatar Nov 08 '24 04:11 khorshuheng

Was spending some time looking into this: so as it is right now, even using OPENAI_API_BASE doesn't help, because we still check for API Key validity even when OPENAI_API_BASE is set. Will need to see if it is possible to fix this.

khorshuheng avatar Nov 08 '24 07:11 khorshuheng

This might be difficult to solve: we are using OpenAIEmbeddings, and for some reason, pydantic validation failed when open api key is not supplied, similar to the issue reported here: https://github.com/langchain-ai/langchain/issues/7251

And validation of open api key is not sent to the proxy or alternate base url, but the actual Open AI endpoint: platform.openai.com.

I assume (apologize in advanced if i am wrong) that the primary motivation of having a custom OpenAI api base url, is to use Ollama that is hosted on a server with OpenAI compatibility. For such use case, it doesn't appear that having an alternate base url can resolve the issue. Instead, actual support for Ollama-based embeddings is needed in appflowy ai service: https://python.langchain.com/docs/integrations/text_embedding/ollama/

khorshuheng avatar Nov 08 '24 08:11 khorshuheng

I support this. Some users might decide to use alternative providers, such as Anthropic, Google, LLM aggregators like OpenRouter, or even locally deployed models (that support OpenAI-compatible endpoints) to avoid dependency on third-party services.

ArakiSatoshi avatar Dec 26 '24 03:12 ArakiSatoshi

I also add my support for this, as someone who is strongly considering this software for our organization and someone who also self-hosts an Aphrodite Engine instance for inference. It is both much cheaper and more organizationally secure to use our own model for tasks.

As far as difficulty implementing an embedding provider, the devs can then choose to either inform the user on the following conditions:

  1. If their OpenAI API-compliant backend does not offer an embedding endpoint, they should migrate to a backend that supports both it and chat completion requests. Aphrodite Engine, vLLM, and Ollama all have embeddings that work via OpenAI call.
  2. If the embedding endpoint they prefer does not support OpenAI API client calls, the self-hoster can either write a front end wrapper to translate the requests or go back to solution 1
  3. Utilize OpenRouter and whatever embedding models plus chat models they support.

In my own JS-based project Enspira, I decided to write a self-check for the embedding API type to determine if it's OpenAI API compliant or not by a special configuration toggle. This allows flexibility for this specific kind of usecase, but you guys may not have the bandwidth to accomodate. I'm not sure how OpenAI API compliant infinity-emb's REST API is, but that's usually the defacto self-hosted AI developer's tool of choice for providing classifier models. I'd consider supporting that library before anything else for embeddings, personally.

I'd love to possibly help offer any feedback the dev team would need, given I've tackled RAG/RIG using self-hosted embedding + chat models (albeit in a different language).

prolix-oc avatar Jan 17 '25 05:01 prolix-oc

Related App issue:

  • AppFlowy-IO/AppFlowy#5379

almereyda avatar Feb 08 '25 17:02 almereyda

I really don't understand how it is possible to support Azure OpenAI API endpoints and not custom URLs. This feature is really needed to get something really selfhosted. Local Ollama is not the same; I don't have a big GPU on my laptop or phone, I want to rely on a third party (openrouter, etc).

Thank you for all the work on appflowy by the way, that is a really nice piece of software.

mtoniott avatar May 09 '25 16:05 mtoniott

I'm going to join those who really want this feature.

I assume (apologize in advanced if i am wrong) that the primary motivation of having a custom OpenAI api base url, is to use Ollama that is hosted on a server with OpenAI compatibility.

In my specific case I run a local model via llama-server, which provides an OpenAI-compatible interface. And many other services out there provide OpenAI-compatible APIs too. So just having the possibility to replace the base URL would be enough, at least for my use case - no need to go deeper and support embeddings directly in the app.

we are using OpenAIEmbeddings, and for some reason, pydantic validation failed when open api key is not supplied

Is it possible to just pass a dummy key if a custom URL is set and the key isn't set, so pydantic won't complain?

even using OPENAI_API_BASE doesn't help, because we still check for API Key validity even when OPENAI_API_BASE is set.

I've just checked it (by setting OPENAI_API_BASE in my .env and also keeping AI_OPENAI_API_KEY), but it seems like it still points to OpenAI.

blacklight avatar May 10 '25 22:05 blacklight

I'm going to join those who really want this feature.

I assume (apologize in advanced if i am wrong) that the primary motivation of having a custom OpenAI api base url, is to use Ollama that is hosted on a server with OpenAI compatibility.

In my specific case I run a local model via llama-server, which provides an OpenAI-compatible interface. And many other services out there provide OpenAI-compatible APIs too. So just having the possibility to replace the base URL would be enough, at least for my use case - no need to go deeper and support embeddings directly in the app.

we are using OpenAIEmbeddings, and for some reason, pydantic validation failed when open api key is not supplied

Is it possible to just pass a dummy key if a custom URL is set and the key isn't set, so pydantic won't complain?

even using OPENAI_API_BASE doesn't help, because we still check for API Key validity even when OPENAI_API_BASE is set.

I've just checked it (by setting OPENAI_API_BASE in my .env and also keeping AI_OPENAI_API_KEY), but it seems like it still points to OpenAI.

If you run a local llama model, you can use the Local AI plugin, in which the desktop will interact directly with the Ollama endpoint, instead of going through the AI service.

khorshuheng avatar May 10 '25 23:05 khorshuheng

If you run a local llama model, you can use the Local AI plugin, in which the desktop will interact directly with the Ollama endpoint, instead of going through the AI service.

I haven't managed to run it on Linux - nothing happens when starting the binary. And the application keeps reporting The Local AI app was not installed correctly.

Also, I'm not sure if this is what I'm looking for. I don't need an LLM model to run locally on my own desktop (and even less on all the devices where I use the app). I already have a more powerful machine that exposes a llama Web server compatible with OpenAI. I just need the application to be able to connect to it.

blacklight avatar May 11 '25 00:05 blacklight

The local AI plugin can connect to any Ollama endpoint, not just one that is running on the local desktop. It's just that, this plugin works only on desktop.

You will need to configure the endpoint to point to your public Ollama endpoint, in Local AI settings.

That being said - we might have to check if the Ollama rest endpoint is compatible with llama cpp server. Seems like they might not be.

khorshuheng avatar May 11 '25 00:05 khorshuheng

While Ollama is a very widely used option to run LLMs locally, it's far from the only one. And almost every other one provide OpenAI API compatible endpoint. My example is LM Studio using mlx backend, that runs better in a set of scenarios on Apple Silicon Macs. I'm not against Ollama, but, for my example, their implementation of mlx is stuck in this PR for the time being, and I'd like at least not to fill my drive space with separate models just for AppFlowy. As for embeddings – apps like AnythingLLM and others supports multiple different embedding providers. Is there a chance, that similar approach could help here?

Sekanato avatar Jul 07 '25 20:07 Sekanato

Ollama is really an option for personal usage or 3~4 users at most.

For team-based, entreprise and production usage, ollama performance is very bad.

See this comparison from August (using tuned Ollama as default ollama is even worse): https://developers.redhat.com/articles/2025/08/08/ollama-vs-vllm-deep-dive-performance-benchmarking

Throughput Image

It seems like vLLM is over 500x faster than ollama for concurrent requests.

Responsiveness Image

Ollama can take 2min (even 5min when untuned) to start replying to a query when 64 are scheduled in parallel.

Disclaimer: vLLM is developed by RedHat, however benchmarks can easily be reproduced locally.

Other frameworks like SGLang or TensorRT-LLM are even tuned for 1000+ concurrent queries.

Besides, as usage of AI grows with automation, agentic workflows, deep-search, auto-tagging, even a single user can generate 10+ queries in parallel.

mratsim avatar Sep 05 '25 10:09 mratsim

adding my +1 to this feature request. I have access to OpenAI-APi-compatible LLMs via mammouth.ai (an aggregator/reseller like openrouter) and would really love to able to use it in Appflowy.

dankerthrone avatar Sep 07 '25 20:09 dankerthrone

I guess most is said, I really want this feature. Ollama doesn't work as well for me as other options. I've got AMD, Vulkan is much more reliable and performant now, but Ollama does not seem to work with it. Also, my Ollama does not seem to handle loading and unloading different models repeatedly, it crashes hard. Generally my AI works with different apps and frontends, just not with Appflowy now.

matthijsbro avatar Oct 01 '25 20:10 matthijsbro