MLServer
MLServer copied to clipboard
An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
## Issue The `mlserver_xgboost` runtime loads models as follows: https://github.com/SeldonIO/MLServer/blob/6864a2ddf90bc3c81e9bf178b1baeba3931de28a/runtimes/xgboost/mlserver_xgboost/xgboost.py#L24-L34 Whilst the `mlserver_lightgbm` runtime does: https://github.com/SeldonIO/MLServer/blob/6864a2ddf90bc3c81e9bf178b1baeba3931de28a/runtimes/lightgbm/mlserver_lightgbm/lightgbm.py#L22 The result is that for xgboost we end up with a sklearn API model,...
The `TreeSHAP` explainer needs access to the underlying model instance. Therefore, to ensure `TreeSHAP` works out-of-the-box, we'll need to ensure the following libraries are present on the Alibi Explain runtime...
Following #1234, a future enhancement is to - when a worker dies - add the cancelled requests back to the queue for another worker to pick up. _Originally posted by...
Add support for Triton's new [`ModelStreamInfer` RPC](https://github.com/triton-inference-server/server/blob/8e6628f4f5a9e3dc8f4c718282dc4e76c3587477/docs/protocol/extension_sequence.md?plain=1#L134) extension to the Open Inference Protocol.
First of all: Thank you for all the work that went into version 1.3.x. With version 1.2.4, I used to navigate to `http://localhost:8080/docs` and check which models were loaded, see...
In some cases, people may structure their custom inference runtimes as "isolated" Python packages (i.e. with a `setup.py` / `pyproject.toml`, etc.). In these cases, to make sure your local packages...
Following the root cause of #1130 it seems that the MLServer metrics not working with the parallel_worker=0 was the reason. As a result [Custom metrics](https://mlserver.readthedocs.io/en/latest/user-guide/metrics.html#custom-metrics) seems to be not working...
Currently `TensorDictCodec` is specific to mlflow runtime but it is useful beyond just this runtime. Consider moving it one level up for other runtimes to access it without having to...
In some cases we need to be able to change some of the configurations of the deployed models like the batch size on the fly without reloading the model, I...
These are predictions that are a bit demanding and will take ~20s to finish. If many requests hit the microservice concurrently, it will start becoming significantly slower (40s, 50s, 60s...