ScaleLLM inspiration
Motivation
Hi all. @lvhan028 @lzhangzz @grimoire @irexyc Recently I discovered an interesting project ScaleLLM. Its positioning is similar to most currently open-source LLM Serving frameworks. It integrates libraries such as FlashInfer. From the README, it can be seen that the main focus is on high performance, while also having comprehensive model support.
The key point is that, although it is mainly implemented in C++, it performs well in terms of flexibility and production environment deployment.
Production Ready: Engineered with production environments in mind, ScaleLLM is equipped with robust system monitoring and management features to ensure a seamless deployment experience.
This may give us some inspiration. Currently, LMDeploy, especially TurboMind Engine, as an engine developed in C++, has made almost no progress in robust system monitoring and supports relatively few models at the same time.
Is there currently a plan in the community to achieve more production readiness in terms of model support and system monitoring? Thanks.
Related resources
No response
Additional context
No response