server icon indicating copy to clipboard operation
server copied to clipboard

Real-Time Multimodal Pipelines for GenAI

Mixpeek Logo

Sign Up | Docs | Email List | Live Demo

Github stars GitHub issues Join Slack

Real-Time Embedding Pipelines

Keep your embeddings, metadata and text in-sync with your representative data no matter where or in what format.

Overview

Mixpeek listens in on changes to your database then processes each change (file_url or inline_content) through an inference pipeline of: extraction, generation and embedding leaving your database with fresh multimodal data, always.

It removes the need of setting up architecture to track database changes, extracting content, processing and embedding it then treating each change as its' own atomic unit. Like if Airbyte and Sagemaker had a baby.

Mixpeek supports every modality: documents, images, video, audio and of course text.

Integrations

  • MongoDB: https://docs.mixpeek.com/integrations/mongodb

Architecture

You can choose to run it as a single node, or as a distributed queue using celery

Future plans are to support seperate Docker images for inference workloads. This will eliminate the noisy neighbor problem.

Getting Started

Clone the Mixpeek repository and navigate to the SDK directory:

git clone [email protected]:mixpeek/server.git
cd mixpeek

Setup Environment

Update your .env file:

PYTHON_VERSION=3.10
OPENAI_KEY=
ENCRYPTION_KEY=

# DBs
MONGODB_URL=
REDIS_URL=

# cloud
MONGODB_ATLAS_PUBLIC_KEY=
MONGODB_ATLAS_PRIVATE_KEY=
MONGODB_ATLAS_GROUP_ID=

# AWS
AWS_ACCESS_KEY=
AWS_SECRET_KEY=
AWS_REGION=
AWS_ARN_LAMBDA=

SERVER_ENV=
SENTRY_DSN=

# Mixpeek
MIXPEEK_ADMIN_TOKEN=
USE_CELERY=true # completely optional

Run via Docker Compose

docker-compose build
docker-compose up

Run via Python

For each service you'll do the following:

  1. Create a virtual environment
poetry env use python3.11
  1. Activate the virtual environment
poetry shell
  1. Install the requirements
poetry install
  1. Run
poetry run uvicorn main:app --reload

Distributed Queue

Due to the nature of processing high volumes of changes from a database, we'll want to send the processes to a queue. Mixpeek currently supports Celery workers for this.

First change the .env variable USE_CELERY to true then in a new terminal run celery within the same mixpeek directory.

Note: Only applicable if not using docker compose

celery -A db.service.celery_app worker --loglevel=info

Optionally, you can run multiple celery workers:

celery -A db.service.celery_app worker --loglevel=info -n worker1@%h &
celery -A db.service.celery_app worker --loglevel=info -n worker2@%h &

API Interface

All methods are exposed as HTTP endpoints.

  • API swagger: https://api.mixpeek.com/docs/openapi.json
  • API Documentation: https://docs.mixpeek.com
  • Python SDK: https://github.com/mixpeek/mixpeek-python

You'll first need to generate an api key via POST /user Use the MIXPEEK_ADMIN_TOKEN you defined in the api env file.

curl --location 'http://localhost:8000/users/private' \
--header 'Authorization: MIXPEEK_ADMIN_TOKEN' \
--header 'Content-Type: application/json' \
--data-raw '{"email":"[email protected]"}'

You can use any email, doesn't matter

Cloud Service

If you want a completely managed version of Mixpeek: https://mixpeek.com/start

We also have a transparent and predictible billing model: https://mixpeek.com/pricing

Are we missing anything?