sglang icon indicating copy to clipboard operation
sglang copied to clipboard

Triton support

Open TheodoreGalanos opened this issue 1 year ago • 7 comments

Hello, curious if we can already use sglang as a backend for NVIDIA's Triton Server.

Amazing work with the library btw, love it!

TheodoreGalanos avatar Jan 18 '24 06:01 TheodoreGalanos

I would think so yeah. At least that's the one I was looking at although I don't really know how to do it :)

TheodoreGalanos avatar Jan 23 '24 08:01 TheodoreGalanos

It would be awesome if support for triton inference server backend was added ...

amirarsalan90 avatar Feb 26 '24 04:02 amirarsalan90

Added one minimal example to serve sglang with triton inference server in this pull request: https://github.com/sgl-project/sglang/pull/242

amirarsalan90 avatar Feb 28 '24 03:02 amirarsalan90

Added one minimal example to serve sglang with triton inference server in this pull request: #242

@amirarsalan90 @TheodoreGalanos Interesting work. And currently this implementation is not SOTA. Previously, my colleague and I supported the Triton Python Backend on LMDeploy. Its performance is comparable to, or even slightly better than the API Server. For more details, you can refer to https://github.com/InternLM/lmdeploy/pull/1329. I'm not sure if you're interested in implementing it in SGLang. Thanks.

zhyncs avatar Jul 18 '24 15:07 zhyncs

@zhyncs Thanks for the suggestion. Unfortunately I'm currently busy and don't have much time to work on this.

amirarsalan90 avatar Jul 18 '24 21:07 amirarsalan90