factorio-learning-environment Implement OpenAI-Compatible Client for vLLM and Ollama (#191)

Objective: Develop a Python client library within the factorio-learning-environment repository that provides an OpenAI-compatible interface for interacting with vLLM and Ollama APIs, as outlined in the task list provided in issue #191. The client should abstract differences between vLLM and Ollama, offering a unified interface where possible while supporting platform-specific features and parameters.

Related Issue: #191

Task List

[ ] Design Client Interface:
- Define OpenAICompatibleClient class to mimic openai.OpenAI interface.
- Identify common endpoints (Completions, Chat Completions, Embeddings) and platform-specific endpoints (vLLM: Tokenizer, Classification; Ollama: Model Management).
- Design handling of platform-specific parameters (vLLM: top_k; Ollama: num_ctx).
[ ] Create Configuration Schema:
- Define a pydantic-based configuration model for client settings (base_url, api_key, model, platform).
- Include platform-specific options (vLLM: chat_template; Ollama: keep_alive).
[ ] Implement Base Client Class:
- Create OpenAICompatibleClient in agents/utils/openai_compatible_client.py.
- Implement initialization with configuration (base_url, api_key, platform).
- Add factory method to instantiate vLLM or Ollama handlers based on platform.
[ ] Implement Common Endpoints:
- Completions API (/v1/completions for vLLM, /api/generate for Ollama):
  - [ ] Implement completions.create method.
  - [ ] Handle parameters (model, prompt, stream, platform-specific options).
  - [ ] Normalize responses to OpenAI schema (choices, usage).
- Chat Completions API (/v1/chat/completions for vLLM, /api/chat for Ollama):
  - [ ] Implement chat.completions.create method.
  - [ ] Support messages, tools, stream, and platform-specific parameters.
  - [ ] Handle multi-modal inputs (images for Ollama’s llava, vLLM’s VLM2Vec).
  - [ ] Normalize streaming and non-streaming responses.
- Embeddings API (/v1/embeddings for vLLM, /api/embed for Ollama):
  - [ ] Implement embeddings.create method.
  - [ ] Support input (text/messages) and model.
  - [ ] Handle platform-specific parameters (vLLM: chat_template; Ollama: truncate).
[ ] Implement vLLM-Specific Endpoints:
- [ ] Tokenizer API (/tokenize, /detokenize): Implement tokenizer.encode and tokenizer.decode.
- [ ] Pooling API (/pooling): Implement pooling.create for encoding prompts.
- [ ] Classification API (/classify): Implement classification.create for text classification.
- [ ] Score API (/score): Implement score.create for sentence pair scoring.
- [ ] Re-rank API (/rerank, /v1/rerank, /v2/rerank): Implement rerank.create for relevance scoring.
- [ ] Transcriptions API (/v1/audio/transcriptions): Implement audio.transcriptions.create for ASR models.
[ ] Implement Ollama-Specific Endpoints:
- [ ] Create Model (/api/create): Implement models.create.
- [ ] List Local Models (/api/tags): Implement models.list.
- [ ] Show Model Information (/api/show): Implement models.info.
- [ ] Copy Model (/api/copy): Implement models.copy.
- [ ] Delete Model (/api/delete): Implement models.delete.
- [ ] Pull Model (/api/pull): Implement models.pull.
- [ ] Push Model (/api/push): Implement models.push.
- [ ] Check Blob Exists (/api/blobs/:digest): Implement blobs.check.
- [ ] Push Blob (/api/blobs/:digest): Implement blobs.push.
- [ ] List Running Models (/api/ps): Implement models.running.
- [ ] Version (/api/version): Implement version.
- [ ] Legacy Embeddings (/api/embeddings): Support deprecated endpoint.
[ ] Handle Platform-Specific Parameters:
- [ ] Support vLLM’s extra_body (e.g., top_k, guided_choice) and extra_headers.
- [ ] Support Ollama’s options (e.g., num_ctx, seed) and format.
- [ ] Map OpenAI parameters to platform-specific equivalents.
[ ] Implement Streaming Support:
- [ ] Handle streaming for Completions and Chat Completions using requests with stream=True.
- [ ] Parse and yield JSON objects incrementally in OpenAI-compatible format.
[ ] Handle Multi-Modal Inputs:
- [ ] Support image inputs (base64-encoded) for vLLM (VLM2Vec) and Ollama (llava).
- [ ] Validate and encode image data in requests.
[ ] Error Handling and Validation:
- [ ] Implement HTTP error handling (400, 404, 500).
- [ ] Use pydantic for input validation.
- [ ] Handle platform-specific errors (e.g., vLLM’s missing chat template, Ollama’s model not found).
[ ] Implement Response Normalization:
- [ ] Normalize vLLM and Ollama responses to OpenAI schemas (choices, usage, created).
- [ ] Map vLLM’s data and Ollama’s response/message to choices.

Acceptance Criteria

[ ] Client supports all documented vLLM and Ollama endpoints with OpenAI-compatible interfaces.
[ ] Common endpoints (Completions, Chat Completions, Embeddings) work across providers.
[ ] Platform-specific endpoints are accessible via intuitive methods.
[ ] Multi-modal inputs (e.g., images) are supported where applicable.
[ ] Client is compatible with openai.OpenAI with minimal code changes.

Example Usage

from agents.utils.openai_compatible_client import OpenAICompatibleClient

# vLLM client
vllm_client = OpenAICompatibleClient(
    base_url="http://localhost:8000/v1",
    api_key="token-abc123",
    platform="vllm"
)
response = vllm_client.chat.completions.create(
    model="NousResearch/Meta-Llama-3-8B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"top_k": 50}
)
print(response.choices[0].message.content)

# Ollama client
ollama_client = OpenAICompatibleClient(
    base_url="http://localhost:11434",
    platform="ollama"
)
response = ollama_client.completions.create(
    model="llama3.2",
    prompt="Why is the sky blue?",
    stream=False,
    options={"seed": 123}
)
print(response.choices[0].text)

# List Ollama models
models = ollama_client.models.list()
print([model["name"] for model in models["models"]])

Notes

vLLM Limitations: Handle unsupported parameters (e.g., suffix, parallel_tool_calls) with warnings or errors.
Ollama Limitations: Account for deprecated /api/embeddings and context parameter.
Performance: Optimize for high QPS, considering vLLM’s X-Request-Id warning.
Extensibility: Design for future platform additions.
Integration with Existing Code: Ensure compatibility with LLMFactory in agents/utils/llm_factory.py, particularly for image support and message formatting.

May 21 '25 14:05 Josephrp

https://ollama.com/blog/openai-compatibility

Have you seen this? It seems that we get some compatibility for free.

May 21 '25 14:05 JackHopkins

yes of course , made a bunch of these already , it's really great , we'll also basically get huggingface inference client compatibility too (but i didnt want to put that in) and that's the one i'm interested in ;-)

May 21 '25 14:05 Josephrp

Huggingface compatibility would be really cool!

May 21 '25 14:05 JackHopkins