sglang Triton support

Triton support

Open TheodoreGalanos opened this issue 1 year ago • 7 comments

Hello, curious if we can already use sglang as a backend for NVIDIA's Triton Server.

Amazing work with the library btw, love it!

Jan 18 '24 06:01 TheodoreGalanos

Would it be similar to how vLLM is used as backend for Triton?

From this tutorial on vLLM on Triton

Jan 22 '24 18:01 isaac-vidas

I would think so yeah. At least that's the one I was looking at although I don't really know how to do it :)

Jan 23 '24 08:01 TheodoreGalanos

It would be awesome if support for triton inference server backend was added ...

Feb 26 '24 04:02 amirarsalan90

Added one minimal example to serve sglang with triton inference server in this pull request: https://github.com/sgl-project/sglang/pull/242

Feb 28 '24 03:02 amirarsalan90

Added one minimal example to serve sglang with triton inference server in this pull request: #242

@amirarsalan90 @TheodoreGalanos Interesting work. And currently this implementation is not SOTA. Previously, my colleague and I supported the Triton Python Backend on LMDeploy. Its performance is comparable to, or even slightly better than the API Server. For more details, you can refer to https://github.com/InternLM/lmdeploy/pull/1329. I'm not sure if you're interested in implementing it in SGLang. Thanks.

Jul 18 '24 15:07 zhyncs

@zhyncs Thanks for the suggestion. Unfortunately I'm currently busy and don't have much time to work on this.

Jul 18 '24 21:07 amirarsalan90

sglang sglang copied to clipboard

Triton support

sglang
sglang copied to clipboard