LLaMa2 GPTQ

Chat AI which can provide responses with reference documents by Prompt engineering over vector database. It suggests related web pages provided through the integration with my previous product, Texonom.

Pursuing local, private and personal AI without requesting external API attained by optimizing inference performance with GPTQ model quantization. This project was inspired by the langchain projects like notion-qa, localGPT.

Demos

CLI Demo

https://github.com/seonglae/llama2gptq/assets/27716524/dba5cd39-ea5c-44d9-bf29-2e8f04039413

Chat Demo

https://github.com/seonglae/llama2gptq/assets/27716524/258de629-0b61-4670-b76b-9f2357adf4c7

Install

This project is using rye as package manager Currently only available with CUDA

rye sync

or using pip

CUDA_VERSION=cu118
TORCH_VERSION=2.0.1
pip install torch==$TORCH_VERSION --index-url https://download.pytorch.org/whl/$CUDA_VERSION --force
pip install torch==$TORCH_VERSION --index-url https://download.pytorch.org/whl/$CUDA_VERSION
pip install .

QA

1. Chat with Web UI

streamlit run chat.py

2. Chat with CLI

python main.py chat

Ingest Documents

Currently code structure is mainly focussed on Notion's csv exported data

Custom source documents

# Put document files to ./knowledge folder
python main.py process
# Or use provided Texonom DB
git clone https://huggingface.co/datasets/texonom/md-chroma-instructor-xl db

Quantize Model

Default model is orca 3b for now

python main quantize --source_model facebook/opt-125m --output opt-125m-4bit-gptq --push

Future Plan

[ ] MPS support using dynamic model selecting
[ ] Stateful Web App support like chat-langchain

App Stack

LLM Stack

Langchain for Prompt Engineering
ChromaDB for storing embeddings
Transformers for LLM engine
AutoGPTQ for Quantization & Inference

Python Stack

Rye for package management
Mypy for type checking
Fire for CLI implementation
Streamlit for Web UI implementation

llama2gptq
llama2gptq copied to clipboard

Metadata

LLaMa2 GPTQ

Demos

CLI Demo

Chat Demo

Install

QA

1. Chat with Web UI

2. Chat with CLI

Ingest Documents

Custom source documents

Quantize Model

Future Plan

App Stack

LLM Stack

Python Stack

← Metadata

Owner

Metadata

llama2gptq llama2gptq copied to clipboard

Metadata

LLaMa2 GPTQ

Demos

CLI Demo

Chat Demo

Install

QA

1. Chat with Web UI

2. Chat with CLI

Ingest Documents

Custom source documents

Quantize Model

Future Plan

App Stack

LLM Stack

Python Stack

← Metadata

Owner

Metadata

llama2gptq
llama2gptq copied to clipboard