llm-inference topic
OpenLLM
Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
autogen
A programming framework for agentic AI 🤖 (PyPi: autogen-agentchat)
mistral-inference
Official inference library for Mistral models
LLM.swift
LLM.swift is a simple and readable library that allows you to interact with large language models locally with ease for macOS, iOS, watchOS, tvOS, and visionOS.
spatten
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
friendli-client
Friendli: the fastest serving engine for generative AI
llm-vscode-inference-server
An endpoint server for efficiently serving quantized open-source LLMs for code.
tree-prompt
Tree prompting: easy-to-use scikit-learn interface for improved prompting.