MLServer
MLServer copied to clipboard
Fix mlserver_huggingface settings device type
If using a model-settings.json of the following form:
{
"name": "my-model",
"implementation": "mlserver_huggingface.HuggingFaceRuntime",
"parameters": {
"extra": {
"task": "text-generation",
"pretrained_model": "model/path",
"model_kwargs": {
"load_in_8bit": true
}
}
}
}
an error occurs when spinning up the server:
The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object.
because the load_in_8bit model kwarg makes forces the model to be loaded with Accelerate, and HuggingFaceSettings.device default value of -1 is passed. I tried simply passing "device": null to the model settings json to prevent this default value, but this wasn't accepted because the schema is typed to int.
The solution is to expand the typing of HuggingFaceSettings.device to accept null (and string, while we're at it). Therefore I have updated the typing to Optional[Union[int,str]], which is very close to the type hint used in the underlying transformers pipeline.
I have tested the above model settings json with this change and the server starts as expected.
@adriangonz, please take a look when you have a chance. It's a small change
@adriangonz Thanks! I added a test that passes None, -1 and cpu for the device to ensure they all load into cpu.
@adriangonz :)
@sakoush if you don't mind taking a look, much appreciated!
@adriangonz @sakoush
is there anything holding this back?