llama icon indicating copy to clipboard operation
llama copied to clipboard

AI Gateway for LLM at Scale

LLM Platform

Open Source LLM Platform to build and deploy applications at scale

Logo

Integrations & Configuration

LLM Providers

OpenAI Platform

https://platform.openai.com/docs/api-reference

providers:
  - type: openai
    token: sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    models:
      gpt-3.5-turbo:
        id: gpt-3.5-turbo-1106

      gpt-4:
        id: gpt-4-1106-preview
        
      text-embedding-ada-002:
        id: text-embedding-ada-002

Azure OpenAI Service

https://azure.microsoft.com/en-us/products/ai-services/openai-service

providers:
  - type: openai
    url: https://xxxxxxxx.openai.azure.com
    token: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

    models:
      gpt-3.5-turbo:
        id: gpt-35-turbo-16k

      gpt-4:
        id: gpt-4-32k
        
      text-embedding-ada-002:
        id: text-embedding-ada-002

Anthropic

https://www.anthropic.com/api

providers:
  - type: anthropic
    token: sk-ant-apixx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    models:
      claude-3-opus:
        id: claude-3-opus-20240229

Ollama

https://ollama.ai

$ ollama start
$ ollama run mistral
providers:
  - type: ollama
    url: http://localhost:11434

    models:
      mistral-7b-instruct:
        id: mistral

LLAMA.CPP

https://github.com/ggerganov/llama.cpp/tree/master/examples/server

# using taskfile.dev
$ task llama:server

# LLAMA.CPP Server
$ llama-server --port 9081 --log-disable --model ./models/mistral-7b-instruct-v0.2.Q4_K_M.gguf

# LLAMA.CPP Server (Multimodal Model)
$ llama-server --port 9081 --log-disable --model ./models/llava-v1.5-7b-Q4_K.gguf --mmproj ./models/llava-v1.5-7b-mmproj-Q4_0.gguf

# using Docker (might be slow)
$ docker run -it --rm -p 9081:9081 -v ./models/:/models/ ghcr.io/ggerganov/llama.cpp:server --host 0.0.0.0 --port 9081 --model /models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
providers:
  - type: llama
    url: http://localhost:9081

    models:
      mistral-7b-instruct:
        id: /models/mistral-7b-instruct-v0.2.Q4_K_M.gguf

WHISPER.CPP

https://github.com/ggerganov/whisper.cpp/tree/master/examples/server

# using taskfile.dev
$ task whisper:server

# WHISPER.CPP Server
$ whisper-server --port 9083 --convert --model ./models/whisper-ggml-medium.bin
providers:
  - type: whisper
    url: http://localhost:9085

    models:
      whisper:
        id: whisper

Hugging Face

https://huggingface.co/

providers:
  - type: huggingface
    url: https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.1

    models:
      mistral-7B-instruct:
        id: tgi

Mimic 3

mkdir -p mimic3
chmod 777 mimic3
docker run -it -p 59125:59125 -v $(pwd)/models/mimic3:/home/mimic3/.local/share/mycroft/mimic3 mycroftai/mimic3
providers:
  - type: mimic
      url: http://localhost:59125
  
      models:
        tts-1:
          id: mimic-3
docker run --rm -it -p 5002:5002 --platform linux/amd64 --entrypoint /bin/bash ghcr.io/coqui-ai/tts-cpu
python3 TTS/server/server.py --list_models
python3 TTS/server/server.py --model_name tts_models/en/vctk/vits

Coqui

providers:
  - type: coqui
    url: http://localhost:5002

    models:
      coqui-1:
        id: coqui-1

LangChain / LangServe

https://python.langchain.com/docs/langserve

providers:
  - type: langchain
    url: http://your-langchain-server:8000

    models:
      langchain:
        id: default

Vector Databses / Indexes

Chroma

https://www.trychroma.com

# using Docker
$ docker run -it --rm -p 9083:8000 -v chroma-data:/chroma/chroma ghcr.io/chroma-core/chroma
indexes:
  docs:
    type: chroma
    url: http://localhost:9083
    namespace: docs
    embedding: text-embedding-ada-002

Weaviate

https://weaviate.io

# using Docker
$ docker run -it --rm -p 9084:8080 -v weaviate-data:/var/lib/weaviate -e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true -e PERSISTENCE_DATA_PATH=/var/lib/weaviate semitechnologies/weaviate
indexes:
  docs:
    type: weaviate
    url: http://localhost:9084
    namespace: Document
    embedding: text-embedding-ada-002

In-Memory

indexes:
  docs:
    type: memory   
    embedding: text-embedding-ada-002
OpenSearch / Elasticsearch
# using Docker
docker run -it --rm -p 9200:9200 -v opensearch-data:/usr/share/opensearch/data -e "discovery.type=single-node" -e DISABLE_SECURITY_PLUGIN=true opensearchproject/opensearch:latest
indexes:
  docs:
    type: elasticsearch
    url: http://localhost:9200
    namespace: docs

Extractors

Text

extractors:
  text:
    type: text

Code

Supported Languages:

  • C#
  • C++
  • Go
  • Java
  • Kotlin
  • Java Script
  • Type Script
  • Python
  • Ruby
  • Rust
  • Scala
  • Swfit
extractors:
  code:
    type: code

Tesseract

https://tesseract-ocr.github.io

# using Docker
docker run -it --rm -p 9086:8884 hertzg/tesseract-server:latest
extractors:
  tesseract:
    type: tesseract
    url: http://localhost:9086

Unstructured

https://unstructured.io

# using Docker
docker run -it --rm -p 9085:8000 quay.io/unstructured-io/unstructured-api:0.0.64 --port 8000 --host 0.0.0.0
extractors:
  unstructured:
    type: unstructured
    url: http://localhost:9085

Classifications

LLM Classifier

classifiers:
  {classifier-id}:
    type: llm
    model: mistral-7b-instruct
    classes:
      class-1: "...Description when to use Class 1..."
      class-2: "...Description when to use Class 2..."

Use Cases

Retrieval Augmented Generation (RAG)

Configuration

chains:
  qa:
    type: rag
    index: docs
    model: mistral-7b-instruct

    # limit: 10
    # distance: 1

    # filters:
    #  {metadata-key}:
    #    classifier: {classifier-id}

Index Documents

Using Extractor

POST http://localhost:8080/v1/index/{index-name}/{extractor}
Content-Type: application/pdf
Content-Disposition: attachment; filename="filename.pdf"

Using Documents

POST http://localhost:8080/v1/index/{index-name}
[
    {
        "id": "id1",
        "content": "content of document...",
        "metadata": {
          "key1": "value1",
          "key2": "value2"
        }
    },
    {
        "id": "id2",
        "content": "content of document...",
        "metadata": {
          "key1": "value1",
          "key2": "value2"
        }
    }
]

Function Calling

Hermes Function Calling

providers:
  - type: llama
    url: http://localhost:9081

    models:
      hermes-2-pro:
        id: /models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf
        adapter: hermesfn