LLM-VM 📚 READ: Architecture Overview & Roadmap

High-Level Architecture Overview

LLM-VM Architecture

[x] Implicit Agents 🔧🕵️: The Anarchy LLM-VM can be set up to use external tools through our agents such as REBEL just by supplying tool descriptions!
[ ] Inference Optimization 🚄: The Anarchy LLM-VM is optimized from the agent level all the way to assembly on known LLM architectures to get the most bang for your buck. With state-of-the-art batching, sparse inference and quantization, distillation, and multi-level colocation, we aim to provide the fastest framework available.
[x] Task Auto-Optimization 🚅: The Anarchy LLM-VM will analyze your use cases for repetitive tasks where it can activate student-teacher distillation to train a super-efficient small model from a larger more general model without losing accuracy. It can furthermore take advantage of data-synthesis techniques to improve results.
[x] Library Callable 📚: We provide a library that can be used from any Python codebase directly.
[ ] HTTP Endpoints 🕸️: We provide an HTTP standalone server to handle completion requests.
[ ] Live Data Augmentation 📊: You will be able to provide a live updating data set and the Anarchy LLM-VM will fine-tune your models or work with a vector DB to provide up-to-date information with citations
[ ] Web Playground 🛝: You will be able to run the Anarchy LLM-VM and test its outputs from the browser.
[ ] Load-Balancing and Orchestration ⚖️: If you have multiple LLMs or providers you'd like to utilize, you will be able to hand them to the Anarchy LLM-VM to automatically figure out which to work with and when to optimize your uptime or your costs
[x] Output Templating 🤵: You can ensure that the LLM only outputs data in specific formats and fills in variables from a template with either regular expressions, LMQL, or OpenAI's template language
[ ] Persistent Stateful Memory 📝: The Anarchy LLM-VM can remember a user's conversation history and react accordingly
[ ] Smart batching 🗞️: Handle multiple calls at the same time from different levels of the llm-vm
[ ] Speculative Preemptive Sampling 🔮: Use a small LLM to predict outputs of a larger LLM and don't fall back to the large one unless sampling is getting bad.
[ ] Token Streaming 🚰: Get a hook for a constantly updating supply of tokens!
[ ] Streamed Backtracking 🔙: Didn't like one output? Look at others! Efficiently.

Projected features

Sep 18 '23 11:09 mmirman

Note: Some of these have existing issues attached to them! This needs cleanup and organization (this is high-priority).

Sep 18 '23 12:09 mmirman

I think that some features build on others. For example, Output Templating can depend on Persistent Stateful Memory by remembering the template. I have used LangChain and the way they do their templating is by injecting it into the users input as a header for every call. Having Memory can eliminate this. Additionally, the backtracking can also be extended to the Persistent State not just outputs. In conclusion, I believe Persistent Stateful Memory should go before Output Templating and perhaps at the same time as backtracking.

Sep 22 '23 02:09 Avr-Arnold

the way they do their templating is by injecting it into the users input as a header for every call

We want strong guarantees with our templating. The ability to sampling constrained by regex for example

Oct 03 '23 19:10 mmirman

Updated Short Term LLM-VM Roadmap

Oct 19 '23 14:10 VictorOdede