llm icon indicating copy to clipboard operation
llm copied to clipboard

Add Ollama Cloud API Support

Open eden-chan opened this issue 3 months ago • 3 comments

https://llm.datasette.io/en/stable/index.html Image

https://deepwiki.com/search/does-it-support-ollama-cloud_658b4538-a837-4cae-9b14-c6ab8c316547

DeepWiki Q&A with Code Context for Repository: simonw/llm

Q1

Does it support ollama cloud

Answer

Based on the codebase context, you're asking about whether LLM supports Ollama Cloud.

Answer

LLM supports Ollama through the llm-ollama plugin, but this is specifically for local models run using Ollama, not Ollama Cloud. llm:13-13

The plugin is listed under the "Local models" section of the plugin directory llm:7-9 , and the documentation explicitly describes it as adding "support for local models run using Ollama". llm:13-13

The quick start guide demonstrates using Ollama for local model execution: llm:102-111

# Install the plugin
llm install llm-ollama

# Download and run a prompt against the Orca Mini 7B model
ollama pull llama3.2:latest
llm -m llama3.2:latest 'What is the capital of France?'

There is no mention of Ollama Cloud support in the codebase. The llm-ollama plugin is designed to work with Ollama running locally on your machine.

Notes

If you need to access cloud-hosted models, LLM supports various remote API providers through other plugins listed in the "Remote APIs" section llm:20-22 , including providers like Anthropic, Google Gemini, Mistral, and others. However, none of these are specifically for Ollama Cloud.

Wiki pages you might want to explore:

  • Overview (simonw/llm)

Citations

File: docs/plugins/directory.md (L7-9)

## Local models

These plugins all help you run LLMs directly on your own computer:

File: docs/plugins/directory.md (L13-13)

- **[llm-ollama](https://github.com/taketwo/llm-ollama)** adds support for local models run using [Ollama](https://ollama.ai/).

File: docs/plugins/directory.md (L20-22)

## Remote APIs

These plugins can be used to interact with remotely hosted models via their API:

File: README.md (L102-111)

You can also [install a plugin](https://llm.datasette.io/en/stable/plugins/installing-plugins.html#installing-plugins) to access models that can run on your local device. If you use [Ollama](https://ollama.com/):

```bash
# Install the plugin
llm install llm-ollama

# Download and run a prompt against the Orca Mini 7B model
ollama pull llama3.2:latest
llm -m llama3.2:latest 'What is the capital of France?'

## Q2

how would you add ollama cloud to this?


### Answer

You're asking about how to add support for **Ollama Cloud** (a cloud-hosted version of Ollama) to the LLM project, which currently only supports local Ollama models through the `llm-ollama` plugin.

## Answer

To add Ollama Cloud support, you would need to create a new plugin that implements the LLM plugin system's model registration hooks llm:13-15 . Here's how:

### 1. Create a Plugin with API Key Support

Since Ollama Cloud would be a remote API service, you'd subclass `llm.KeyModel` instead of `llm.Model` llm:31-39 . This enables the standard API key management through `llm keys set`.

### 2. Implement the Model Class

Your plugin would follow this structure llm:18-32 :

```python
import llm

@llm.hookimpl
def register_models(register):
    register(OllamaCloud())

class OllamaCloud(llm.KeyModel):
    model_id = "ollama-cloud-model-name"
    needs_key = "ollama-cloud"
    key_env_var = "OLLAMA_CLOUD_API_KEY"
    
    def execute(self, prompt, stream, response, conversation, key=None):
        # Make API calls to Ollama Cloud endpoint
        # using the provided key

3. Consider Async Support

For better performance with remote APIs, you should also implement an async version llm:52-96 :

class OllamaCloudAsync(llm.AsyncKeyModel):
    model_id = "ollama-cloud-model-name"
    
    async def execute(self, prompt, stream, response, conversation=None, key=None):
        # Async API calls to Ollama Cloud

4. Register Both Versions

Register both sync and async models together llm:48-61 :

@llm.hookimpl
def register_models(register):
    register(OllamaCloud(), OllamaCloudAsync(), aliases=("ollama-cloud",))

5. Package Structure

Create a pyproject.toml file to define the plugin entry point llm:40-49 :

[project]
name = "llm-ollama-cloud"
version = "0.1"

[project.entry-points.llm]
ollama_cloud = "llm_ollama_cloud"

Notes

The existing llm-ollama plugin is specifically designed for local models llm:13-13 , so a separate plugin would be the cleanest approach. You could look at other remote API plugins like llm-anthropic or llm-gemini llm:24-26 as reference implementations for how to structure API calls, handle streaming responses, and manage authentication.

If Ollama Cloud provides an OpenAI-compatible API, you could alternatively configure it using the extra-openai-models.yaml approach llm:25-41 without writing a custom plugin.

Wiki pages you might want to explore:

  • Overview (simonw/llm)
  • Interactive Chat Interface (simonw/llm)

Citations

File: llm/hookspecs.py (L13-15)

@hookspec
def register_models(register):
    "Register additional model instances representing LLM models that can be called"

File: docs/plugins/advanced-model-plugins.md (L31-39)

If your plugin requires an API key you should subclass the `llm.KeyModel` class instead of the `llm.Model` class. Start your model definition like this:

```python
import llm

class HostedModel(llm.KeyModel):
    needs_key = "hosted" # Required
    key_env_var = "HOSTED_API_KEY" # Optional

**File:** docs/plugins/advanced-model-plugins.md (L52-96)

```markdown
## Async models

Plugins can optionally provide an asynchronous version of their model, suitable for use with Python [asyncio](https://docs.python.org/3/library/asyncio.html). This is particularly useful for remote models accessible by an HTTP API.

The async version of a model subclasses `llm.AsyncModel` instead of `llm.Model`. It must implement an `async def execute()` async generator method instead of `def execute()`.

This example shows a subset of the OpenAI default plugin illustrating how this method might work:

```python
from typing import AsyncGenerator
import llm

class MyAsyncModel(llm.AsyncModel):
    # This can duplicate the model_id of the sync model:
    model_id = "my-model-id"

    async def execute(
        self, prompt, stream, response, conversation=None
    ) -> AsyncGenerator[str, None]:
        if stream:
            completion = await client.chat.completions.create(
                model=self.model_id,
                messages=messages,
                stream=True,
            )
            async for chunk in completion:
                yield chunk.choices[0].delta.content
        else:
            completion = await client.chat.completions.create(
                model=self.model_name or self.model_id,
                messages=messages,
                stream=False,
            )
            if completion.choices[0].message.content is not None:
                yield completion.choices[0].message.content

If your model takes an API key you should instead subclass llm.AsyncKeyModel and have a key= parameter on your .execute() method:

class MyAsyncModel(llm.AsyncKeyModel):
    ...
    async def execute(
        self, prompt, stream, response, conversation=None, key=None
    ) -> AsyncGenerator[str, None]:

**File:** docs/plugins/tutorial-model-plugin.md (L18-32)

```markdown
In that directory create a file called `llm_markov.py` containing this:

```python
import llm

@llm.hookimpl
def register_models(register):
    register(Markov())

class Markov(llm.Model):
    model_id = "markov"

    def execute(self, prompt, stream, response, conversation):
        return ["hello world"]

**File:** docs/plugins/tutorial-model-plugin.md (L40-49)

```markdown
Next, create a `pyproject.toml` file. This is necessary to tell LLM how to load your plugin:

```toml
[project]
name = "llm-markov"
version = "0.1"

[project.entry-points.llm]
markov = "llm_markov"

**File:** docs/plugins/plugin-hooks.md (L48-61)

```markdown
If your model includes an async version, you can register that too:

```python
class AsyncHelloWorld(llm.AsyncModel):
    model_id = "helloworld"

    async def execute(self, prompt, stream, response):
        return ["hello world"]

@llm.hookimpl
def register_models(register):
    register(HelloWorld(), AsyncHelloWorld(), aliases=("hw",))

This demonstrates how to register a model with both sync and async versions, and how to specify an alias for that model.


**File:** docs/plugins/directory.md (L13-13)

```markdown
- **[llm-ollama](https://github.com/taketwo/llm-ollama)** adds support for local models run using [Ollama](https://ollama.ai/).

File: docs/plugins/directory.md (L24-26)

- **[llm-mistral](https://github.com/simonw/llm-mistral)** adds support for [Mistral AI](https://mistral.ai/)'s language and embedding models.
- **[llm-gemini](https://github.com/simonw/llm-gemini)** adds support for Google's [Gemini](https://ai.google.dev/docs) models.
- **[llm-anthropic](https://github.com/simonw/llm-anthropic)** supports Anthropic's [Claude 3 family](https://www.anthropic.com/news/claude-3-family), [3.5 Sonnet](https://www.anthropic.com/news/claude-3-5-sonnet) and beyond.

File: docs/other-models.md (L25-41)

(openai-compatible-models)=

## OpenAI-compatible models

Projects such as [LocalAI](https://localai.io/) offer a REST API that imitates the OpenAI API but can be used to run other models, including models that can be installed on your own machine. These can be added using the same configuration mechanism.

The `model_id` is the name LLM will use for the model. The `model_name` is the name which needs to be passed to the API - this might differ from the `model_id`, especially if the `model_id` could potentially clash with other installed models.

The `api_base` key can be used to point the OpenAI client library at a different API endpoint.

To add the `orca-mini-3b` model hosted by a local installation of [LocalAI](https://localai.io/), add this to your `extra-openai-models.yaml` file:

```yaml
- model_id: orca-openai-compat
  model_name: orca-mini-3b.ggmlv3
  api_base: "http://localhost:8080"

eden-chan avatar Oct 04 '25 16:10 eden-chan

using cloud models in ollama works practically the exact same as using local models in ollama - the only difference is that for cloud models the ollama server is relaying requests to the cloud. you still make the same API calls to your own ollama server, you just specify it's a cloud model via the model tag

as such, llm is already compatible with ollama cloud models:

screenshot of a terminal session demonstrating ollama cloud models in llm

pharmacologic avatar Oct 05 '25 03:10 pharmacologic

you may need to first use ollama run ... or ollama pull ... with your desired cloud model + tag so that it shows up in your list of ollama models.

e.g. ollama run deepseek-v3.1:671b-cloud

the full list of ollama cloud models is here: https://ollama.com/search?c=cloud

note: some weights might be local-only, like qwen3-coder:30b. cloud models with have a -cloud suffix on the model tag

https://docs.ollama.com/cloud

pharmacologic avatar Oct 05 '25 03:10 pharmacologic

ollama.com also now (somewhat confusingly) offers a cloud API, where you can use https://ollama.com for the ollama server URL and use an API key from https://ollama.com/settings/keys

this already works with llm as well - just use llm set keys ollama to configure your ollama.com API key, and set the OLLAMA_HOST environmental variable to use https://ollama.com

terminal session showing that llm can connect to ollama.com cloud API

pharmacologic avatar Oct 05 '25 03:10 pharmacologic