redis-arXiv-search
redis-arXiv-search copied to clipboard
Vector search demo with the arXiv paper dataset, RedisVL, HuggingFace, OpenAI, Cohere, FastAPI, React, and Redis.
🔎 Redis arXiv Search
This repository is the official codebase for the arxiv paper search app hosted at: https://docsearch.redisvl.com
Redis is a highly performant, production-ready vector database, which can be used for many types of applications. Here we showcase Redis vector search applied to a document retrieval use case. Read more about AI-powered search in the technical blog post published by our partners, Data Science Dojo.
Dataset
The arXiv papers dataset was sourced from the the following Kaggle link. arXiv is commonly used for scientific research in a variety of fields. Exposing a semantic search layer enables natural human language to be used to discover relevant papers.

Application
This app was built as a Single Page Application (SPA) with the following components:
- Redis Stack for vector database
- RedisVL for Python vector db client
- FastAPI for Python API
- Pydantic for schema and validation
- React (with Typescript)
- Docker Compose for development
- MaterialUI for some UI elements/components
- React-Bootstrap for some UI elements
- Huggingface, OpenAI, and Cohere for vector embedding creation
Some inspiration was taken from this Cookiecutter project and turned into a SPA application instead of a separate front-end server approach.
Embedding Providers
Embeddings represent the semantic properies of the raw text and enable vector similarity search. This applications supports HuggingFace, OpenAI, and Cohere embeddings out of the box.
| Provider | Embedding Model | Required? |
|---|---|---|
| HuggingFace | sentence-transformers/all-mpnet-base-v2 |
Yes |
| OpenAI | text-embedding-ada-002 |
Yes |
| Cohere | embed-multilingual-v3.0 |
Yes |
Interested in a different embedding provider? Feel free to open a PR and make a suggested addition.
Want to use a different model than the one listed? Set the following environment variables in your .env file (see below) to change:
SENTENCE_TRANSFORMER_MODELOPENAI_EMBEDDING_MODELCOHERE_EMBEDDING_MODEL
🚀 Running the App
- Before running the app, install Docker Desktop.
- Clone (and optionally fork) this Github repo to your machine.
$ git clone https://github.com/RedisVentures/redis-arXiv-search.git - Make a copy of the
.env.templatefile:$ cd redis-arXiv-search/ $ cp .env.template .env- Add your
OPENAI_API_KEYto the.envfile. Need one? Get an API key - Add you
COHERE_API_KEYto the.envfile. Need one? Get an API key
- Add your
- Decide which Redis you plan to use, choose one of the methods below
- Redis Stack runs Redis as a local docker container.
- Redis Cloud will manage a Redis database on your behalf in the cloud.
Redis Stack Docker (Local)
Using Redis Stack locally doesn't require any additional steps. However, it will consume more resources on your machine and have performance limitations.
Use the provided docker-compose file for running the application locally:
$ docker compose -f docker-local-redis.yml up
Redis Cloud
-
Get a FREE Redis Cloud Database. Make sure to include the Search module.
-
Add the
REDIS_HOST,REDIS_PASSWORD, andREDIS_PORTenvironment variables to your.envfile. -
Run the App:
$ docker compose -f docker-cloud-redis.yml up
Customizing (optional)
- You can use the provided Jupyter Notebook in the
data/directory to create paper embeddings and metadata. The output JSON files will end up stored in thedata/directory and used when creating your own container. - Use the
./build.shscript to build your own docker image based on the application source code and dataset changes. - If you want to use K8s instead of Docker Compose, we have some resources to help you get started.
React Dev Environment
It's typically easier to build front end in an interactive environment, testing changes in realtime.
- Deploy the app using steps above.
- Install packages (you may need to use
npmto installyarn)$ cd frontend/ $ yarn install --no-optional - Use
yarnto serve the application from your machine$ yarn start - Navigate to
http://localhost:3000in a browser.
All changes to your frontend code will be reflected in your display in semi realtime.
Troubleshooting
Every once and a while you need to clear out some Docker cached artifacts. Run docker system prune, restart Docker Desktop, and try again.
This project is maintained by Redis on a good faith basis. Please, open an issue here on GitHub and we will try to be responsive to these.
