sglang
sglang copied to clipboard
Triton support
Hello, curious if we can already use sglang as a backend for NVIDIA's Triton Server.
Amazing work with the library btw, love it!
I would think so yeah. At least that's the one I was looking at although I don't really know how to do it :)
It would be awesome if support for triton inference server backend was added ...
Added one minimal example to serve sglang with triton inference server in this pull request: https://github.com/sgl-project/sglang/pull/242
Added one minimal example to serve sglang with triton inference server in this pull request: #242
@amirarsalan90 @TheodoreGalanos Interesting work. And currently this implementation is not SOTA. Previously, my colleague and I supported the Triton Python Backend on LMDeploy. Its performance is comparable to, or even slightly better than the API Server. For more details, you can refer to https://github.com/InternLM/lmdeploy/pull/1329. I'm not sure if you're interested in implementing it in SGLang. Thanks.
@zhyncs Thanks for the suggestion. Unfortunately I'm currently busy and don't have much time to work on this.