chat-with-pdf
chat-with-pdf copied to clipboard
Chat with your PDF files for free, using Langchain, Groq, ChromaDB, and Jina AI embeddings.
Chat With PDFs
Chat with your PDF files for free, using Langchain, Groq, Chroma vector store, and Jina AI embeddings. This repository contains a simple Python implementation of the RAG (Retrieval-Augmented-Generation) system. The RAG model is used to retrieve relevant chunks of the user PDF file based on user queries and provide informative responses.
Installation
Follow these steps:
- Clone the repository
git clone https://github.com/S4mpl3r/chat-with-pdf.git - Create a virtual environment and activate it (optional, but highly recommended).
python -m venv .venv Windows: .venv\Scripts\activate Linux: source .venv/bin/activate - Install required packages:
python -m pip install -r requirements.txt - Create a .env file in the root of the project and populate it with the following keys. You'll need to obtain your api keys:
JINA_API_KEY=<YOUR KEY> GROQ_API_KEY=<YOUR KEY> HF_TOKEN=<YOUR TOKEN> HF_HOME=<PATH TO STORE HUGGINGFACE MODEL> - Run the program:
python main.py
Configuration
You can customize the behavior of the system by modifying the constants and parameters in the main.py file:
- EMBED_MODEL_NAME: Specify the name of the Jina embedding model to be used.
- LLM_NAME: Specify the name of the language model (Refer to Groq for the list of available models).
- LLM_TEMPERATURE: Set the temperature parameter for the language model.
- CHUNK_SIZE: Specify the maximum chunk size allowed by the embedding model.
- DOCUMENT_DIR: Specify the directory where PDF documents are stored.
- VECTOR_STORE_DIR: Specify the directory where vector embeddings are stored.
- COLLECTION_NAME: Specify the name of the collection for the chroma vector store.
Resources
Kudos to the amazing libraries and services listed below:
License
MIT