llama2gptq icon indicating copy to clipboard operation
llama2gptq copied to clipboard

Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.

LLaMa2 GPTQ

Chat AI which can provide responses with reference documents by Prompt engineering over vector database. It suggests related web pages provided through the integration with my previous product, Texonom.

Pursuing local, private and personal AI without requesting external API attained by optimizing inference performance with GPTQ model quantization. This project was inspired by the langchain projects like notion-qa, localGPT.

Demos

CLI Demo

https://github.com/seonglae/llama2gptq/assets/27716524/dba5cd39-ea5c-44d9-bf29-2e8f04039413

Chat Demo

https://github.com/seonglae/llama2gptq/assets/27716524/258de629-0b61-4670-b76b-9f2357adf4c7


Install

This project is using rye as package manager Currently only available with CUDA

rye sync

or using pip

CUDA_VERSION=cu118
TORCH_VERSION=2.0.1
pip install torch==$TORCH_VERSION --index-url https://download.pytorch.org/whl/$CUDA_VERSION --force
pip install torch==$TORCH_VERSION --index-url https://download.pytorch.org/whl/$CUDA_VERSION
pip install .

QA

1. Chat with Web UI

streamlit run chat.py

2. Chat with CLI

python main.py chat

Ingest Documents

Currently code structure is mainly focussed on Notion's csv exported data

Custom source documents

# Put document files to ./knowledge folder
python main.py process
# Or use provided Texonom DB
git clone https://huggingface.co/datasets/texonom/md-chroma-instructor-xl db

Quantize Model

Default model is orca 3b for now

python main quantize --source_model facebook/opt-125m --output opt-125m-4bit-gptq --push

Future Plan

  • [ ] MPS support using dynamic model selecting
  • [ ] Stateful Web App support like chat-langchain

App Stack

LLM Stack

Python Stack

  • Rye for package management
  • Mypy for type checking
  • Fire for CLI implementation
  • Streamlit for Web UI implementation