MLServer issues

implementation of timeout mechanism for worker processing the request

1

Right now seems like the worker does not have timeout mechanism to handle the request which runs forever. In that sense, if several fat requests came to workers, they might...

Issuenate

Adding model warm up option

Similar to the [TFServing](https://www.tensorflow.org/tfx/serving/saved_model_warmup) and [Triton Server](https://github.com/triton-inference-server/server/pull/791) it would be nice to have the option of warming up a model (It has a huge impact on models' latency, especially in...

saeid93

Support Triton repository layout

Triton enforces a specific repository layout, which makes it easier to find models by name. We should consider adding an alternative repository implementation that follows Triton's layout - which could...

adriangonz

Enable control of logging to be more granual

2

Currently setting debug=true stops ALL logging which is fine for some use cases but others may want to still see some logging (ours). The behaviour we would love to see...

dtpryce

Control the output name when using `decode_args` decorator

2

Hi, I've started using the `decode_args` decorator as I find it super useful! However, is there a way to specify the output name? I see that it's set to `output-0`....

AndriiG13

Possible room for latency improvement

1

As far as I understand the codec-friendly way of sending image/audio files in Seldon is sending images as NumPy arrays. Following the community slack [discussion-1](https://seldondev.slack.com/archives/C03DQFTFXMX/p1671303812225929) and [discussion-2](https://seldondev.slack.com/archives/C03DQFTFXMX/p1671493747611059?thread_ts=1671475244.549829&cid=C03DQFTFXMX) I ran a...

saeid93

Show warning if batching shapes do not match

When adaptive batching is enabled, the runtime may return a response with a different batch to the input one. This results on some individual responses being empty when they go...

adriangonz

MLServer custom runtime is slower than Python Wrapper in Seldon-core

3

Hi, I have been doing some benchmarking work on `MLServer` custom runtimes vs Python wrapper APIs in `seldon-core`, for the same model, and same resources, and found that a `seldon-core`...

hseelawi

Expose detector metrics in Alibi Detect runtime

joshsgoldstein

[Feature request] Extending MLFlow Runtime to load Model based on Flavour

Hey Everyone, could we extend the mlflow runtime to allow using `predict_prob()`. I know the mlflow [pyfunc](https://github.com/SeldonIO/MLServer/blob/eb00b083508a2c860689c0c090309e37466b7ea7/runtimes/mlflow/mlserver_mlflow/runtime.py#L155) interface only provides `predict()`, but there are ways to load the model based...

hjilke

MLServer
MLServer copied to clipboard

Metadata

implementation of timeout mechanism for worker processing the request

Adding model warm up option

Support Triton repository layout

Enable control of logging to be more granual

Control the output name when using `decode_args` decorator

Possible room for latency improvement

Show warning if batching shapes do not match

MLServer custom runtime is slower than Python Wrapper in Seldon-core

Expose detector metrics in Alibi Detect runtime

[Feature request] Extending MLFlow Runtime to load Model based on Flavour

← Metadata

Owner

Metadata

MLServer MLServer copied to clipboard

Metadata

← Metadata

Owner

Metadata

MLServer
MLServer copied to clipboard