Saeid Ghafouri
Saeid Ghafouri
I think there should be a page added to doc with an MLServer example in [Servers](https://docs.seldon.io/projects/seldon-core/en/latest/nav/config/servers.html) page. I also think that there should be somewhere in the doc (maybe in...
## Describe the bug As per our conversation over [Slack](https://seldondev.slack.com/archives/C03DQFTFXMX/p1661526920540159?thread_ts=1661438086.861089&cid=C03DQFTFXMX) the metaata example in the documentation does not have the expected behavior and result in a microservice error. ## To...
Academic systems like [Rim](https://dl.acm.org/doi/abs/10.1145/3450268.3453521) and [grandslam](https://dl.acm.org/doi/10.1145/3302424.3303958) have the ability to share a model across multiple pipelines. As there are use cases in which a single model could be used as...
MLserver HuggingFace runtime cannot work with speech models in the batched mode as the pipeline accepts a list of arrays [(request1), (request2), (request3), (request4), (request5)] which the type of each...
As per https://github.com/SeldonIO/MLServer/pull/740#discussion_r981259626 it would be possible to just merge both huggingface batch variable into the mlserver batch variable for less redundancy in the `model-settings.json` file.
HuggingFace runtime has a batch_size variable in its setting. This should be checked against the MLServer max_batch_size setting for consistency. ```json { "name": "transformer", "implementation": "mlserver_huggingface.HuggingFaceRuntime", "max_batch_size": 5, "max_batch_time": 1,...
As mentioned https://github.com/SeldonIO/MLServer/pull/727#discussion_r972003311 the convertor from grpc output is not implemented. This isn't working as the following line: ```python from mlserver.grpc.converters import ModelInferResponseConverter from mlserver.codecs.string import StringRequestCodec inference_response = ModelInferResponseConverter.to_types(response)...
There is no information about how to retrieve back the raw dictionary data from mlserver output in the [documentation](https://mlserver.readthedocs.io/en/latest/examples/custom-json/README.html). I will add a pull request for discussion.
if the two variable `max_batch_time` and `max_batch_size` are defined in the `model-settings.json`: ```json { "name": "node-1", "implementation": "models.NodeOne", "max_batch_size": 5, "max_batch_time": 1, "parameters": { "uri": "./fakeuri" } } ``` Then...