server
server copied to clipboard
Real-Time Multimodal Pipelines for GenAI
Sign Up | Docs | Email List | Live Demo
Real-Time Embedding Pipelines
Keep your embeddings, metadata and text in-sync with your representative data no matter where or in what format.
Overview
Mixpeek listens in on changes to your database then processes each change (file_url
or inline_content
) through an inference pipeline of: extraction
, generation
and embedding
leaving your database with fresh multimodal data, always.
It removes the need of setting up architecture to track database changes, extracting content, processing and embedding it then treating each change as its' own atomic unit. Like if Airbyte and Sagemaker had a baby.
Mixpeek supports every modality: documents, images, video, audio and of course text.
Integrations
- MongoDB: https://docs.mixpeek.com/integrations/mongodb
Architecture
You can choose to run it as a single node, or as a distributed queue using celery
Future plans are to support seperate Docker images for inference workloads. This will eliminate the noisy neighbor problem.
Getting Started
Clone the Mixpeek repository and navigate to the SDK directory:
git clone [email protected]:mixpeek/server.git
cd mixpeek
Setup Environment
Update your .env file:
PYTHON_VERSION=3.10
OPENAI_KEY=
ENCRYPTION_KEY=
# DBs
MONGODB_URL=
REDIS_URL=
# cloud
MONGODB_ATLAS_PUBLIC_KEY=
MONGODB_ATLAS_PRIVATE_KEY=
MONGODB_ATLAS_GROUP_ID=
# AWS
AWS_ACCESS_KEY=
AWS_SECRET_KEY=
AWS_REGION=
AWS_ARN_LAMBDA=
SERVER_ENV=
SENTRY_DSN=
# Mixpeek
MIXPEEK_ADMIN_TOKEN=
USE_CELERY=true # completely optional
Run via Docker Compose
docker-compose build
docker-compose up
Run via Python
For each service you'll do the following:
- Create a virtual environment
poetry env use python3.11
- Activate the virtual environment
poetry shell
- Install the requirements
poetry install
- Run
poetry run uvicorn main:app --reload
Distributed Queue
Due to the nature of processing high volumes of changes from a database, we'll want to send the processes to a queue. Mixpeek currently supports Celery workers for this.
First change the .env variable USE_CELERY
to true
then in a new terminal run celery within the same mixpeek
directory.
Note: Only applicable if not using docker compose
celery -A db.service.celery_app worker --loglevel=info
Optionally, you can run multiple celery workers:
celery -A db.service.celery_app worker --loglevel=info -n worker1@%h &
celery -A db.service.celery_app worker --loglevel=info -n worker2@%h &
API Interface
All methods are exposed as HTTP endpoints.
- API swagger: https://api.mixpeek.com/docs/openapi.json
- API Documentation: https://docs.mixpeek.com
- Python SDK: https://github.com/mixpeek/mixpeek-python
You'll first need to generate an api key via POST /user
Use the MIXPEEK_ADMIN_TOKEN
you defined in the api env file.
curl --location 'http://localhost:8000/users/private' \
--header 'Authorization: MIXPEEK_ADMIN_TOKEN' \
--header 'Content-Type: application/json' \
--data-raw '{"email":"[email protected]"}'
You can use any email, doesn't matter
Cloud Service
If you want a completely managed version of Mixpeek: https://mixpeek.com/start
We also have a transparent and predictible billing model: https://mixpeek.com/pricing
Are we missing anything?
- Email: [email protected]
- Schedule a Call: https://mixpeek.com/contact