MLServer icon indicating copy to clipboard operation
MLServer copied to clipboard

An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more

Results 304 MLServer issues
Sort by recently updated
recently updated
newest added

Description I am trying to load the Phi-2 model using the Hugging Face runtime, but I am encountering an Out of Memory (OOM) error. The GPU I am using is...

VLLM runtime has a wealth of token metrics, example prompt_tokens_total and generation_tokens_total. Why does mlserver have none?

Hi, Is it expected that we lose the /invocations path ( mlflow backward compatible inference path) when using a tarball environment per model, eg when using model-settings.json ? Note that...

Hi, The example HuggingFace pipeline does no longer work and fails with an error "Cannot import Conversation from transformers.pipelines": https://github.com/SeldonIO/MLServer/blob/master/docs/examples/huggingface/README.md. As per https://discuss.huggingface.co/t/cannot-import-conversation-from-transformers-utils-py/91556/1, downgrading `transformers` library to version 4.41.2 (`pip...