MLServer
MLServer copied to clipboard
An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
Nice! We could probably use this to read other things within MLServer as well, like `model-settings.json` files. _Originally posted by @adriangonz in https://github.com/SeldonIO/MLServer/pull/720#discussion_r965667311_
If this is duplicated from other folders within the root `tests/` package, feel free to move it to the base `conftest.py` BTW _Originally posted by @adriangonz in https://github.com/SeldonIO/MLServer/pull/720#discussion_r965696527_
I'm new to seldon. When I added some debug statements (using `print()` and also the `logging` module) to the model I found that it will work in `load()` but not...
HuggingFace runtime has a batch_size variable in its setting. This should be checked against the MLServer max_batch_size setting for consistency. ```json { "name": "transformer", "implementation": "mlserver_huggingface.HuggingFaceRuntime", "max_batch_size": 5, "max_batch_time": 1,...
huggingface_runtime output JSON serializer does not support NumPy basic datatypes when the data is a dict value
Good morning! I noticed you have changed (in rest server) , queue request management from 1.0.0 to 1.1.0, adding in the last one python queues. I would like to know...
As mentioned https://github.com/SeldonIO/MLServer/pull/727#discussion_r972003311 the convertor from grpc output is not implemented. This isn't working as the following line: ```python from mlserver.grpc.converters import ModelInferResponseConverter from mlserver.codecs.string import StringRequestCodec inference_response = ModelInferResponseConverter.to_types(response)...
Hi, we would like the number of the elements within the request queue in pool inside of a metric, for performance issues. It's a good idea to get this data...
There is no information about how to retrieve back the raw dictionary data from mlserver output in the [documentation](https://mlserver.readthedocs.io/en/latest/examples/custom-json/README.html). I will add a pull request for discussion.
if the two variable `max_batch_time` and `max_batch_size` are defined in the `model-settings.json`: ```json { "name": "node-1", "implementation": "models.NodeOne", "max_batch_size": 5, "max_batch_time": 1, "parameters": { "uri": "./fakeuri" } } ``` Then...