MLServer issues

Out of Memory (OOM) Error When Loading Phi-2 Model with Hugging Face Runtime

4

Description I am trying to load the Phi-2 model using the Hugging Face runtime, but I am encountering an Out of Memory (OOM) error. The GPU I am using is...

fyuan1316

Support token metrics

1

VLLM runtime has a wealth of token metrics, example prompt_tokens_total and generation_tokens_total. Why does mlserver have none?

luohua13

Lose /invocations path when using envrironment tarball from model-settings.json

4

Hi, Is it expected that we lose the /invocations path ( mlflow backward compatible inference path) when using a tarball environment per model, eg when using model-settings.json ? Note that...

ylambrus

Hi, The example HuggingFace pipeline does no longer work and fails with an error "Cannot import Conversation from transformers.pipelines": https://github.com/SeldonIO/MLServer/blob/master/docs/examples/huggingface/README.md. As per https://discuss.huggingface.co/t/cannot-import-conversation-from-transformers-utils-py/91556/1, downgrading `transformers` library to version 4.41.2 (`pip...

Phyks

MLServer
MLServer copied to clipboard

Metadata

Out of Memory (OOM) Error When Loading Phi-2 Model with Hugging Face Runtime

Support token metrics

Lose /invocations path when using envrironment tarball from model-settings.json

Cannot import "Conversation"

← Metadata

Owner

Metadata

MLServer MLServer copied to clipboard

Metadata

Out of Memory (OOM) Error When Loading Phi-2 Model with Hugging Face Runtime

Support token metrics

Lose /invocations path when using envrironment tarball from model-settings.json

Cannot import "Conversation"

← Metadata

Owner

Metadata

MLServer
MLServer copied to clipboard