Why is TorchServe No Longer Actively Maintained?
Hello, I noticed that the TorchServe GitHub page has been marked as 'Limited Maintenance,' indicating that the project is no longer actively maintained. Could you share the reasons behind this decision? Is it related to the development direction of the PyTorch ecosystem? Additionally, are there any recommended alternative tools or solutions for deploying PyTorch models?
Thank you for your response!
Could you clarify your decision? What do you plan to use in the future?
If torchserve is no longer the way to serve PyTorch models, what else is out there?
The best like for like replacement is probably Nvidia Triton with the pytorch backend right now I think
Developer of LitServe here - LitServe has similar API interface and on-par performance so super easy to port your application. https://lightning.ai/docs/litserve/home/benchmarks
Just want to share a list of resources to go from here...
Ray Serve (https://docs.ray.io/en/latest/serve/index.html) Triton Inference Server ( https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html ) MLflow (https://mlflow.org/) BentoML (https://www.bentoml.com/) KServe (https://kserve.github.io/website/latest/) Seldon (https://www.seldon.io/solutions/seldon-mlserver/) Cortex (https://www.cortex.dev/) ForestFlow (https://github.com/ForestFlow/ForestFlow) TensorFlow Serving (https://www.tensorflow.org/tfx/guide/serving) DeepDetect (https://www.deepdetect.com/overview/introduction) Multi Model Server (MMS) (https://github.com/awslabs/multi-model-server)
So far I looked at BentoML and RayServe. Here are my thoughts:
- BentoML seems to have a paywall
BentoMLCloudwhich you cannot self host. But it has nice features like model management and a triton inference server interface - RayServe also has a triton inference runtime and model management
in LiteServe I cannot find any model managment (?)
If you are also comming from TorchServe and like OpenSource, the packaging and management api (like me =) feel free to share your experiences/research or correct or add anything, I'm still searching/researching.
Cheers.
Does anyone have any recommendations for alternative frameworks that allow per-model user-provided code like torchserve's handler.py?
The "no longer actively maintained" notice should include a date. Especially true for the documentation at pytorch.org/serve
Without having to dig I should be able to determine how recently the project has been abandoned. Thankfully google found this repo and the list of releases has the latest release in 2024-09
Does anyone have any recommendations for alternative frameworks that allow per-model user-provided code like torchserve's
handler.py?
Hi @geodavic, I’d recommend LitServe as a great alternative. As a contributor, I can say it offers a user-friendly interface for serving models with excellent performance. Feel free to try it out and let me know if you have any questions! 😊
@bhimrazy What I miss TorchServe is it support seperate endpoints and different GPU configuration for multiple models. However LitServe doesn't according to https://lightning.ai/docs/litserve/features/multiple-endpoints#multiple-routes.
Hi @yuzhichang,
By design, LitServe is kept simple yet performant.
Btw, you can easily configure devices, GPUs, and workers while setting up the LitServer (see: LitServer Devices).
For multiple endpoints, I’d suggest creating a Docker image for each endpoint and serving them that way. If you’d like to share any thoughts or use cases on multiple endpoints, feel free to add them to this issue: #271. 😊
Just want to share a list of resources to go from here...
...
If you are also comming from TorchServe and like OpenSource, the packaging and management api (like me =) feel free to share your experiences/research or correct or add anything, I'm still searching/researching.
Cheers.
@whplh This is list missing OpenVINO Model Server
If your plan is to deploy on Intel CPU, iGPU, GPU or NPU - this is a way to go. It has support for popular APIs like KServe (both gRPC and REST) and also OpenAI API if you plan on serving LLMs. And PyTorch models will work out of the box when converted to ONNX or OpenVINO IR format.