gcp-redis-llm-stack
gcp-redis-llm-stack copied to clipboard
Reference architecture for LLM-based applications on Google Cloud Platform with Redis Enterprise as a high-performance data layer.
Scalable LLM Architectures with Redis & GCP Vertex AI
☁️ Generative AI with Google Vertex AI comes with a specialized in-console studio experience, a dedicated API for Gemini and easy-to-use Python SDK designed for deploying and managing instances of Google's powerful language models.
⚡ Redis Enterprise offers fast and scalable vector search, with an API for index creation, management, blazing-fast search, and hybrid filtering. When coupled with its versatile data structures - Redis Enterprise shines as the optimal solution for building high-quality Large Language Model (LLM) apps.
This repo serves as a foundational architecture for building LLM applications with Redis and GCP services.
Reference architecture

- Primary Data Sources
- Data Extraction and Loading
- Large Language Models
text-embedding-gecko@003for embeddingsgemini-1.0-pro-001for LLM generation and chat
- High-Performance Data Layer (Redis)
- Semantic caching to improve LLM performance and associated costs
- Vector search for context retrieval from knowledge base
RAG demo
Open the code tutorial using the Colab notebook to get your hands dirty with Redis and Vertex AI on GCP. It's a step-by-step walkthrough of setting up the required data, and generating embeddings, and building RAG from scratch in order to build fast LLM apps; highlighting Redis vector search and semantic caching.
Additional resources
- Streamlit PDF chatbot example app
- Redis vector search documentation
- Get started with RedisVL
- Google VertexAI resources
- More Redis ai resources