MLServer
MLServer copied to clipboard
check for consistency between MLServer and HuggingFace Batch size
trafficstars
HuggingFace runtime has a batch_size variable in its setting. This should be checked against the MLServer max_batch_size setting for consistency.
{
"name": "transformer",
"implementation": "mlserver_huggingface.HuggingFaceRuntime",
"max_batch_size": 5,
"max_batch_time": 1,
"parameters": {
"extra": {
"task": "text-generation",
"pretrained_model": "distilgpt2",
"device": 0,
"batch_size": 5
}
}
}