llm_client
llm_client copied to clipboard
An Interface for Deterministic Signals from Probabilistic LLM Vibes
Table of Contents
- About The Project
- Getting Started
- Roadmap
- Contributing
- License
- Contact
A rust interface for the OpenAI API and Llama.cpp ./server API
- A unified API for testing and integrating OpenAI and HuggingFace LLM models.
- Load models from HuggingFace with just a URL.
- Uses Llama.cpp server API rather than bindings, so as long as the Llama.cpp server API remains stable this project will remain usable.
- Prebuilt agents - not chatbots - to unlock the true power of LLMs.
Easily switch between models and APIs
// Use an OpenAI model
let llm_definition = LlmDefinition::OpenAiLlm(OpenAiDef::Gpt35Turbo)
// Or use a model from hugging face
let llm_definition: LlmDefinition = LlmDefinition::LlamaLlm(LlamaDef::new(
MISTRAL7BCHAT_MODEL_URL,
LlamaPromptFormat::Mistral7BChat,
Some(9001), // Max tokens for model AKA context size
Some(2), // Number of threads to use for server
Some(22), // Layers to load to GPU. Dependent on VRAM
Some(false), // This starts the llama.cpp server with embedding flag disabled
Some(true), // Logging enabled
));
let response = basic_text_gen::generate(
&LlmDefinition::LlamaLlm(llm_definition),
Some("Howdy!"),
)
.await?;
eprintln!(response)
Get deterministic responses from LLMs
if !boolean_classifier::classify(
llm_definition,
Some(hopefully_a_list),
Some("Is the attached feature a list of content split into discrete entries?"),
)
.await?
{
panic!("{}, was not properly split into a list!", hopefully_a_list)
}
Create embeddings*
let client_openai: ProviderClient =
ProviderClient::new(&LlmDefinition::OpenAiLlm(OpenAiDef::EmbeddingAda002), None).await;
let _: Vec<Vec<f32>> = client_openai
.generate_embeddings(
&vec![
"Hello, my dog is cute".to_string(),
"Hello, my cat is cute".to_string(),
],
Some(EmbeddingExceedsMaxTokensBehavior::Panic),
)
.await
.unwrap();
- Currently with limited support for llama.cpp
Start Llama.cpp via CLI
cargo run -p llm_client --bin server_runner start --model_url "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q8_0.gguf"
$ llama server listening at http://localhost:8080
cargo run -p llm_client --bin server_runner stop
Download HF models via CLI
cargo run -p llm_client --bin model_loader_cli --model_url "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q8_0.gguf"
Dependencies
async-openai is used to interact with the OpenAI API. A modifed version of the async-openai crate is used for the Llama.cpp server. If you just need an OpenAI API interface, I suggest using the async-openai crate.
Hugging Face's rust client is used for model downloads from the huggingface hub.
(back to top)
Getting Started
Step-by-step guide
- Clone repo:
git clone https://github.com/ShelbyJenkins/llm_client.git
cd llm_client
-
Optional: Build devcontainer from
llm_client/.devcontainer/devcontainer.json
This will build out a dev container with nvidia dependencies installed. -
Add llama.cpp:
git submodule init
git submodule update
- Build llama.cpp ( This is dependent on your hardware. Please see full instructions here):
// Example build for nvidia gpus
cd llm_client/src/providers/llama_cpp/llama_cpp
make LLAMA_CUDA=1
- Test llama.cpp ./server
cargo run -p llm_client --bin server_runner start --model_url "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q8_0.gguf"
This will download and load the given model, and then start the server.
When you see llama server listening at http://localhost:8080
, you can load the llama.cpp UI in your browser.
Stop the server with cargo run -p llm_client --bin server_runner stop
.
- Using OpenAi: Add a
.env
file in the llm_client dir with the varOPENAI_API_KEY=<key>
Examples
-
Interacting with the provided agents.
-
Interacting with the llm_client directly.
Roadmap
- Handle the various prompt formats of LLM models more gracefully
- Unit tests
- Add additional classifier agents:
- many from many
- one from many
- Implement all openai functionality with llama.cpp
- More external apis (claude/etc)
(back to top)
Contributing
This is my first Rust crate. All contributions or feedback is more than welcomed!
(back to top)
License
Distributed under the MIT License. See LICENSE.txt
for more information.
(back to top)
Contact
Shelby Jenkins - Here or Linkedin
(back to top)