MLServer
MLServer copied to clipboard
An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
HI, I actually started using ML Server for a new project. But I wanted to set-up using GRPC. Is there any example or documentation on how to set it up?...
I think I discovered a bug in the current gRPC code in mlserver. I have a model that returns float16 arrays and I tried to get predictions via gRPC. I...
The currently implemented metrics in MLServer are all around pure count of the number of requests:  Compared with similar platforms like [Triton Server](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/metrics.md) many other metrics could be added...
Hi there - the docs show very little information for how to incorporate OpenTelemetry tracing in MLServer, added in this [PR](https://github.com/SeldonIO/MLServer/pull/1281). If for instance I’m deploying this model server within...
Because of the fact that Japanese mixes phonetic scripts and Chinese characters, special algorithms and dictionaries are needed to run tokenizers for these these models. A popular example of this...
```python from transformers import LlamaForCausalLM, AutoTokenizer, TextGenerationPipeline model = LlamaForCausalLM.from_pretrained("daryl149/llama-2-7b-hf",load_in_8bit=True) tokenizer = AutoTokenizer.from_pretrained("daryl149/llama-2-7b-hf") pipeline = TextGenerationPipeline(model, tokenizer) pipeline("Once upon a time,", max_new_tokens=100,return_full_text=False) ``` `max_new_tokens` and `return_full_text` are extra arguments we...
The swagger docs don't work when the browser cannot access the CDN for swagger js dependencies. Is it possible to configure a custom CDN or serve the dependencies using static...
Hi there i got issue with poetry install mlserver, and it is due to a tritonclient 2.37+ now depends on cuda-python, it will block mlserver installation if the machine does...
Following https://github.com/SeldonIO/MLServer/pull/1403, it would be great to also support the `CatBoostRegressor` (and subsequently `CatBoostRanker`) model types. From the linked PR: Q: >Looking ahead to adding support for the Regressor and...
# What When a dataframe is encoded by `PandasCodec.encode_request(use_bytes=True)`, `PandasCodec.decode_request()` cannot restore the exact dataframe. client code ```py X = pd.DataFrame( dict( int_col=[1, 2, 3], str_col=["s1", "s2", "s3"], ) )...