catai
catai copied to clipboard
Run AI ✨ assistant locally! with simple API for Node.js 🚀

CatAI
Run GGUF models on your computer with a chat ui.
Your own AI assistant runs locally on your computer.
Inspired by Node-Llama-Cpp, Llama.cpp
Installation & Use
Make sure you have Node.js (download current) installed.
npm install -g catai
catai install vicuna-7b-16k-q4_k_s
catai up
Features
- Auto detect programming language 🧑💻
- Click on user icon to show original message 💬
- Real time text streaming ⏱️
- Fast model downloads 🚀
CLI
Usage: catai [options] [command]
Options:
-V, --version output the version number
-h, --help display help for command
Commands:
install|i [options] [models...] Install any GGUF model
models|ls [options] List all available models
use [model] Set model to use
serve|up [options] Open the chat website
update Update server to the latest version
active Show active model
remove|rm [options] [models...] Remove a model
uninstall Uninstall server and delete all models
help [command] display help for command
Install command
Usage: cli install|i [options] [models...]
Install any GGUF model
Arguments:
models Model name/url/path
Options:
-t --tag [tag] The name of the model in local directory
-l --latest Install the latest version of a model (may be unstable)
-b --bind [bind] The model binding method
-bk --bind-key [key] key/cookie that the binding requires
-h, --help display help for command
Cross-platform
You can use it on Windows, Linux and Mac.
This package uses node-llama-cpp which supports the following platforms:
- darwin-x64
- darwin-arm64
- linux-x64
- linux-arm64
- linux-armv7l
- linux-ppc64le
- win32-x64-msvc
Memory usage
Runs on most modern computers. Unless your computer is very very old, it should work.
According to a llama.cpp discussion thread, here are the memory requirements:
- 7B => ~4 GB
- 13B => ~8 GB
- 30B => ~16 GB
Good to know
- All download data will be downloaded at
~/catai
folder by default. - The download is multi-threaded, so it may use a lot of bandwidth, but it will download faster!
Web API
There is also a simple API that you can use to ask the model questions.
const response = await fetch('http://127.0.0.1:3000/api/chat/prompt', {
method: 'POST',
body: JSON.stringify({
prompt: 'Write me 100 words story'
}),
headers: {
'Content-Type': 'application/json'
}
});
const data = await response.text();
For more information, please read the API guide
Configuration
You can edit the configuration via the web ui.
More information here
Contributing
Contributions are welcome!
Please read our contributing guide to get started.
License
This project uses Llama.cpp to run models on your computer. So any license applied to Llama.cpp is also applied to this project.

If you like this repo, star it ✨