inference-server topic
truss
The simplest way to serve AI/ML models in production
pipeless
An open-source computer vision framework to build and deploy apps in minutes
inference
A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
inference-benchmark
Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
fullstack-machine-learning-inference
Fullstack machine learning inference template
onnxruntime-server
ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
Triton-TensorRT-Inference-CRAFT-pytorch
Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server -...
friendli-client
Friendli: the fastest serving engine for generative AI
wingman
Wingman is the fastest and easiest way to run Llama models on your PC or Mac.
Simple-Inference-Server
Inference Server Implementation from Scratch for Machine Learning Models