airllm
airllm copied to clipboard
docker based or BareMetal serving
Wondering if any plans to implement to enable servings,
similar to vllm serving, it should support OpenAI compatible chat endpoints.
I would like to get any kind of loaded serving on endpoint, what is the way to serve it on endpoint?