Add Ollama Cloud API Support
https://llm.datasette.io/en/stable/index.html
https://deepwiki.com/search/does-it-support-ollama-cloud_658b4538-a837-4cae-9b14-c6ab8c316547
DeepWiki Q&A with Code Context for Repository: simonw/llm
Q1
Does it support ollama cloud
Answer
Based on the codebase context, you're asking about whether LLM supports Ollama Cloud.
Answer
LLM supports Ollama through the llm-ollama plugin, but this is specifically for local models run using Ollama, not Ollama Cloud. llm:13-13
The plugin is listed under the "Local models" section of the plugin directory llm:7-9 , and the documentation explicitly describes it as adding "support for local models run using Ollama". llm:13-13
The quick start guide demonstrates using Ollama for local model execution: llm:102-111
# Install the plugin
llm install llm-ollama
# Download and run a prompt against the Orca Mini 7B model
ollama pull llama3.2:latest
llm -m llama3.2:latest 'What is the capital of France?'
There is no mention of Ollama Cloud support in the codebase. The llm-ollama plugin is designed to work with Ollama running locally on your machine.
Notes
If you need to access cloud-hosted models, LLM supports various remote API providers through other plugins listed in the "Remote APIs" section llm:20-22 , including providers like Anthropic, Google Gemini, Mistral, and others. However, none of these are specifically for Ollama Cloud.
Wiki pages you might want to explore:
- Overview (simonw/llm)
Citations
File: docs/plugins/directory.md (L7-9)
## Local models
These plugins all help you run LLMs directly on your own computer:
File: docs/plugins/directory.md (L13-13)
- **[llm-ollama](https://github.com/taketwo/llm-ollama)** adds support for local models run using [Ollama](https://ollama.ai/).
File: docs/plugins/directory.md (L20-22)
## Remote APIs
These plugins can be used to interact with remotely hosted models via their API:
File: README.md (L102-111)
You can also [install a plugin](https://llm.datasette.io/en/stable/plugins/installing-plugins.html#installing-plugins) to access models that can run on your local device. If you use [Ollama](https://ollama.com/):
```bash
# Install the plugin
llm install llm-ollama
# Download and run a prompt against the Orca Mini 7B model
ollama pull llama3.2:latest
llm -m llama3.2:latest 'What is the capital of France?'
## Q2
how would you add ollama cloud to this?
### Answer
You're asking about how to add support for **Ollama Cloud** (a cloud-hosted version of Ollama) to the LLM project, which currently only supports local Ollama models through the `llm-ollama` plugin.
## Answer
To add Ollama Cloud support, you would need to create a new plugin that implements the LLM plugin system's model registration hooks llm:13-15 . Here's how:
### 1. Create a Plugin with API Key Support
Since Ollama Cloud would be a remote API service, you'd subclass `llm.KeyModel` instead of `llm.Model` llm:31-39 . This enables the standard API key management through `llm keys set`.
### 2. Implement the Model Class
Your plugin would follow this structure llm:18-32 :
```python
import llm
@llm.hookimpl
def register_models(register):
register(OllamaCloud())
class OllamaCloud(llm.KeyModel):
model_id = "ollama-cloud-model-name"
needs_key = "ollama-cloud"
key_env_var = "OLLAMA_CLOUD_API_KEY"
def execute(self, prompt, stream, response, conversation, key=None):
# Make API calls to Ollama Cloud endpoint
# using the provided key
3. Consider Async Support
For better performance with remote APIs, you should also implement an async version llm:52-96 :
class OllamaCloudAsync(llm.AsyncKeyModel):
model_id = "ollama-cloud-model-name"
async def execute(self, prompt, stream, response, conversation=None, key=None):
# Async API calls to Ollama Cloud
4. Register Both Versions
Register both sync and async models together llm:48-61 :
@llm.hookimpl
def register_models(register):
register(OllamaCloud(), OllamaCloudAsync(), aliases=("ollama-cloud",))
5. Package Structure
Create a pyproject.toml file to define the plugin entry point llm:40-49 :
[project]
name = "llm-ollama-cloud"
version = "0.1"
[project.entry-points.llm]
ollama_cloud = "llm_ollama_cloud"
Notes
The existing llm-ollama plugin is specifically designed for local models llm:13-13 , so a separate plugin would be the cleanest approach. You could look at other remote API plugins like llm-anthropic or llm-gemini llm:24-26 as reference implementations for how to structure API calls, handle streaming responses, and manage authentication.
If Ollama Cloud provides an OpenAI-compatible API, you could alternatively configure it using the extra-openai-models.yaml approach llm:25-41 without writing a custom plugin.
Wiki pages you might want to explore:
- Overview (simonw/llm)
- Interactive Chat Interface (simonw/llm)
Citations
File: llm/hookspecs.py (L13-15)
@hookspec
def register_models(register):
"Register additional model instances representing LLM models that can be called"
File: docs/plugins/advanced-model-plugins.md (L31-39)
If your plugin requires an API key you should subclass the `llm.KeyModel` class instead of the `llm.Model` class. Start your model definition like this:
```python
import llm
class HostedModel(llm.KeyModel):
needs_key = "hosted" # Required
key_env_var = "HOSTED_API_KEY" # Optional
**File:** docs/plugins/advanced-model-plugins.md (L52-96)
```markdown
## Async models
Plugins can optionally provide an asynchronous version of their model, suitable for use with Python [asyncio](https://docs.python.org/3/library/asyncio.html). This is particularly useful for remote models accessible by an HTTP API.
The async version of a model subclasses `llm.AsyncModel` instead of `llm.Model`. It must implement an `async def execute()` async generator method instead of `def execute()`.
This example shows a subset of the OpenAI default plugin illustrating how this method might work:
```python
from typing import AsyncGenerator
import llm
class MyAsyncModel(llm.AsyncModel):
# This can duplicate the model_id of the sync model:
model_id = "my-model-id"
async def execute(
self, prompt, stream, response, conversation=None
) -> AsyncGenerator[str, None]:
if stream:
completion = await client.chat.completions.create(
model=self.model_id,
messages=messages,
stream=True,
)
async for chunk in completion:
yield chunk.choices[0].delta.content
else:
completion = await client.chat.completions.create(
model=self.model_name or self.model_id,
messages=messages,
stream=False,
)
if completion.choices[0].message.content is not None:
yield completion.choices[0].message.content
If your model takes an API key you should instead subclass llm.AsyncKeyModel and have a key= parameter on your .execute() method:
class MyAsyncModel(llm.AsyncKeyModel):
...
async def execute(
self, prompt, stream, response, conversation=None, key=None
) -> AsyncGenerator[str, None]:
**File:** docs/plugins/tutorial-model-plugin.md (L18-32)
```markdown
In that directory create a file called `llm_markov.py` containing this:
```python
import llm
@llm.hookimpl
def register_models(register):
register(Markov())
class Markov(llm.Model):
model_id = "markov"
def execute(self, prompt, stream, response, conversation):
return ["hello world"]
**File:** docs/plugins/tutorial-model-plugin.md (L40-49)
```markdown
Next, create a `pyproject.toml` file. This is necessary to tell LLM how to load your plugin:
```toml
[project]
name = "llm-markov"
version = "0.1"
[project.entry-points.llm]
markov = "llm_markov"
**File:** docs/plugins/plugin-hooks.md (L48-61)
```markdown
If your model includes an async version, you can register that too:
```python
class AsyncHelloWorld(llm.AsyncModel):
model_id = "helloworld"
async def execute(self, prompt, stream, response):
return ["hello world"]
@llm.hookimpl
def register_models(register):
register(HelloWorld(), AsyncHelloWorld(), aliases=("hw",))
This demonstrates how to register a model with both sync and async versions, and how to specify an alias for that model.
**File:** docs/plugins/directory.md (L13-13)
```markdown
- **[llm-ollama](https://github.com/taketwo/llm-ollama)** adds support for local models run using [Ollama](https://ollama.ai/).
File: docs/plugins/directory.md (L24-26)
- **[llm-mistral](https://github.com/simonw/llm-mistral)** adds support for [Mistral AI](https://mistral.ai/)'s language and embedding models.
- **[llm-gemini](https://github.com/simonw/llm-gemini)** adds support for Google's [Gemini](https://ai.google.dev/docs) models.
- **[llm-anthropic](https://github.com/simonw/llm-anthropic)** supports Anthropic's [Claude 3 family](https://www.anthropic.com/news/claude-3-family), [3.5 Sonnet](https://www.anthropic.com/news/claude-3-5-sonnet) and beyond.
File: docs/other-models.md (L25-41)
(openai-compatible-models)=
## OpenAI-compatible models
Projects such as [LocalAI](https://localai.io/) offer a REST API that imitates the OpenAI API but can be used to run other models, including models that can be installed on your own machine. These can be added using the same configuration mechanism.
The `model_id` is the name LLM will use for the model. The `model_name` is the name which needs to be passed to the API - this might differ from the `model_id`, especially if the `model_id` could potentially clash with other installed models.
The `api_base` key can be used to point the OpenAI client library at a different API endpoint.
To add the `orca-mini-3b` model hosted by a local installation of [LocalAI](https://localai.io/), add this to your `extra-openai-models.yaml` file:
```yaml
- model_id: orca-openai-compat
model_name: orca-mini-3b.ggmlv3
api_base: "http://localhost:8080"
using cloud models in ollama works practically the exact same as using local models in ollama - the only difference is that for cloud models the ollama server is relaying requests to the cloud. you still make the same API calls to your own ollama server, you just specify it's a cloud model via the model tag
as such, llm is already compatible with ollama cloud models:
you may need to first use ollama run ... or ollama pull ... with your desired cloud model + tag so that it shows up in your list of ollama models.
e.g. ollama run deepseek-v3.1:671b-cloud
the full list of ollama cloud models is here: https://ollama.com/search?c=cloud
note: some weights might be local-only, like qwen3-coder:30b. cloud models with have a -cloud suffix on the model tag
https://docs.ollama.com/cloud
ollama.com also now (somewhat confusingly) offers a cloud API, where you can use https://ollama.com for the ollama server URL and use an API key from https://ollama.com/settings/keys
this already works with llm as well - just use llm set keys ollama to configure your ollama.com API key, and set the OLLAMA_HOST environmental variable to use https://ollama.com