LLM-VM icon indicating copy to clipboard operation
LLM-VM copied to clipboard

📚 READ: Architecture Overview & Roadmap

Open mmirman opened this issue 2 years ago • 4 comments

High-Level Architecture Overview

LLM-VM Architecture

  • [x] Implicit Agents 🔧🕵️: The Anarchy LLM-VM can be set up to use external tools through our agents such as REBEL just by supplying tool descriptions!

  • [ ] Inference Optimization 🚄: The Anarchy LLM-VM is optimized from the agent level all the way to assembly on known LLM architectures to get the most bang for your buck. With state-of-the-art batching, sparse inference and quantization, distillation, and multi-level colocation, we aim to provide the fastest framework available.

  • [x] Task Auto-Optimization 🚅: The Anarchy LLM-VM will analyze your use cases for repetitive tasks where it can activate student-teacher distillation to train a super-efficient small model from a larger more general model without losing accuracy. It can furthermore take advantage of data-synthesis techniques to improve results.

  • [x] Library Callable 📚: We provide a library that can be used from any Python codebase directly.

  • [ ] HTTP Endpoints 🕸️: We provide an HTTP standalone server to handle completion requests.

  • [ ] Live Data Augmentation 📊: You will be able to provide a live updating data set and the Anarchy LLM-VM will fine-tune your models or work with a vector DB to provide up-to-date information with citations

  • [ ] Web Playground 🛝: You will be able to run the Anarchy LLM-VM and test its outputs from the browser.

  • [ ] Load-Balancing and Orchestration ⚖️: If you have multiple LLMs or providers you'd like to utilize, you will be able to hand them to the Anarchy LLM-VM to automatically figure out which to work with and when to optimize your uptime or your costs

  • [x] Output Templating 🤵: You can ensure that the LLM only outputs data in specific formats and fills in variables from a template with either regular expressions, LMQL, or OpenAI's template language

  • [ ] Persistent Stateful Memory 📝: The Anarchy LLM-VM can remember a user's conversation history and react accordingly

  • [ ] Smart batching 🗞️: Handle multiple calls at the same time from different levels of the llm-vm

  • [ ] Speculative Preemptive Sampling 🔮: Use a small LLM to predict outputs of a larger LLM and don't fall back to the large one unless sampling is getting bad.

  • [ ] Token Streaming 🚰: Get a hook for a constantly updating supply of tokens!

  • [ ] Streamed Backtracking 🔙: Didn't like one output? Look at others! Efficiently.

Projected features

mmirman avatar Sep 18 '23 11:09 mmirman

Note: Some of these have existing issues attached to them! This needs cleanup and organization (this is high-priority).

mmirman avatar Sep 18 '23 12:09 mmirman

I think that some features build on others. For example, Output Templating can depend on Persistent Stateful Memory by remembering the template. I have used LangChain and the way they do their templating is by injecting it into the users input as a header for every call. Having Memory can eliminate this. Additionally, the backtracking can also be extended to the Persistent State not just outputs. In conclusion, I believe Persistent Stateful Memory should go before Output Templating and perhaps at the same time as backtracking.

Avr-Arnold avatar Sep 22 '23 02:09 Avr-Arnold

the way they do their templating is by injecting it into the users input as a header for every call

We want strong guarantees with our templating. The ability to sampling constrained by regex for example

mmirman avatar Oct 03 '23 19:10 mmirman

Updated Short Term LLM-VM Roadmap

LLM-VM (1)

VictorOdede avatar Oct 19 '23 14:10 VictorOdede