serving topic
List
serving repositories
trafficstars
grps
147
Stars
13
Forks
Watchers
【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接...
tiny-llm
3.4k
Stars
230
Forks
3.4k
Watchers
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.