Retrieval-Augmented-QA
Retrieval-Augmented-QA copied to clipboard
Query, ask and chat with a document-index via transformer models!
Retrieval-Augmented-QA-Demo
Query, ask and chat with a document-index via a friendly web ui!
Installation
Simply run it via Docker-Compose. An example configuration can be found in the docker-compose.yml file.
The UI is then available via http://localhost:8501 and the API-Swagger-Documentation via http://localhost:8001/docs
If you want build it from source just download the repo and use the docker-compose-src-*.yml files.
an example .env config can be found in .env.example.
All modules are available as prebuild containers via the Github container registry.
Overview
The demo has three main components:
- Streamlit web ui
- FastAPI endpoint, which hosts the models and QA pipelines
- Importers, which import textfiles into the system
Chat Module
The chat module allows the user to either use a LLM to query and summarize retrieved documents from the document index or just chat with the model normaly. It supports multiple chat adapter which are exposed via a streaming api. The following libraries are supported:
- OpenAI: Uses the default ChatGPT API.To use this adapter a OpenAI API-Token has to be supplied.
- Huggingface: Supports nearly all LLMs on the Huggingfacehub. Also supports PEFT finetuned models. To run this a GPU needs to be passed to the Container running the API.
- llama-rs: Can run GGML converted models like Alpaca on a CPU with relatively low resource usage. Use this adapter if you dont have a GPU.
Semantic Search & Extractive QA Module
The semantic search and extractive qa modules use Haystack to query the ElasticSearch database. Nearly all embedding and QA models on the Huggingfacehub are supported. By default they will be executed on the CPU as a GPU is commonly allocated to the chat model.
Importer
The importers collect text documents and covert them to Haystack-Documents which are then commited to the database.
Wiki-Importer
Uses a Wikipedia minidump to populate the database with about 500.000 Wikipedia Articles.
ST4-Importer
Can be used to import technical documentation which was created via the Schema ST4 software.
Settings and Environment
To be able to run this repo on different hardware configurations many settings are configurable via environment variables
API:
| Environment Variable | Default | Description |
|---|---|---|
| HUGGINGFACE_TOKEN | Huggingface token | |
| ELASTICSEARCH_HOST | localhost | Elasticsearch host address |
| ELASTICSEARCH_PORT | 9200 | Elasticsearch port number |
| ELASTICSEARCH_USER | Elasticsearch user | |
| ELASTICSEARCH_PASSWORD | Elasticsearch password | |
| EMBEDDING_DIM | 384 | Embedding dimension |
| SIMILARITY | cosine | Similarity measure |
| EMBEDDING_MODEL | LLukas22/all-MiniLM-L12-v2-embedding-all | Embedding model |
| EXTRACTIVE_QA_MODEL | LLukas22/all-MiniLM-L12-v2-qa-en | Extractive QA model |
| USE_GPU | False | Use GPU for QA and embedding |
| USE_8BIT | False | Use bits-and-bytes |
| CONCURENCY_LIMIT | 5 | Concurrency limit of api |
| DEBUG | True | Debug mode |
| CHATMODEL | CPU | Chat Adapter to use (OPENAI,GPU,CPU) |
| CHAT_MAX_INPUT_LENGTH | 2000 | Chat max input length |
| OPENAI_TOKEN | None | OpenAI token |
| BASE_CHAT_MODEL | decapoda-research/llama-7b-hf | Base chat model |
| USE_PEFT | True | Use PEFT |
| ADAPTER_CHAT_MODEL | tloen/alpaca-lora-7b | Adapter chat model |
| ADAPTER_APPLY_OPTIMIZATIONS | True | Apply Torch optimizations |
| CPU_MODEL_REPO | Sosaka/Alpaca-native-4bit-ggml | CPU model repository |
| CPU_MODEL_FILENAME | ggml-alpaca-7b-q4.bi | CPU model filename |
| CPU_MODEL_THREADS | 8 | CPU model threads |
| CPU_MODEL_KV_16 | True | CPU model use f16 for KV-Store |
UI:
| Environment Variable | Default |
|---|---|
| SYSTEM_PROMPT | The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know. \n\n Current Conversation: |
| CHAT_WELCOME_MESSAGE | Hello, I will try to answer any questions you have for me! Untick the checkbox to disable the document search and chat with me normally. |
| WELCOME_MESSAGE | This demo was initialized with about 500,000 English Wikipedia articles from April 1st, 2023. Feel free to ask the system about any topic you like. \n\n ⚠️CAUTION: If offline models are used, no safety layers are in place. If you ask the system about an offensive topic, it will answer you, even if the answer is immoral!⚠️ |
| API_HOST | localhost |
| API_PORT | 8001 |
| ENABLE_ADMIN | False |
Wiki-Importer:
| Environment Variable | Default | Description |
|---|---|---|
| ELASTIC_HOST | localhost | Elasticsearch host address |
| ELASTIC_PORT | 9200 | Elasticsearch port number |
| ELASTIC_EMBEDDING_DIM | 384 | Embedding dimension |
| CACHE_DIR | ./importer_cache | Cache directory path |
| WIKI_URL | https://dumps.wikimedia.org/simplewiki/20230401/simplewiki-20230401-pages-articles-multistream.xml.bz2 | URL of the Wiki-dump to download |
ST4-Importer:
| Environment Variable | Default | Description |
|---|---|---|
| ELASTIC_HOST | localhost | Elasticsearch host address |
| ELASTIC_PORT | 9200 | Elasticsearch port number |
| ELASTIC_EMBEDDING_DIM | 384 | Embedding dimension |
| ST4_FOLDER | ./.st4_files | ST4 files folder path |