MLServer icon indicating copy to clipboard operation
MLServer copied to clipboard

Fix mlserver_huggingface settings device type

Open geodavic opened this issue 2 years ago • 5 comments
trafficstars

If using a model-settings.json of the following form:

{
    "name": "my-model",
    "implementation": "mlserver_huggingface.HuggingFaceRuntime",
    "parameters": {
        "extra": {
            "task": "text-generation",
            "pretrained_model": "model/path",
            "model_kwargs": {
                "load_in_8bit": true
            }
        }
    }
}

an error occurs when spinning up the server:

The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object.

because the load_in_8bit model kwarg makes forces the model to be loaded with Accelerate, and HuggingFaceSettings.device default value of -1 is passed. I tried simply passing "device": null to the model settings json to prevent this default value, but this wasn't accepted because the schema is typed to int.

The solution is to expand the typing of HuggingFaceSettings.device to accept null (and string, while we're at it). Therefore I have updated the typing to Optional[Union[int,str]], which is very close to the type hint used in the underlying transformers pipeline.

I have tested the above model settings json with this change and the server starts as expected.

geodavic avatar Nov 15 '23 19:11 geodavic

@adriangonz, please take a look when you have a chance. It's a small change

nanbo-liu avatar Nov 15 '23 19:11 nanbo-liu

@adriangonz Thanks! I added a test that passes None, -1 and cpu for the device to ensure they all load into cpu.

geodavic avatar Nov 27 '23 17:11 geodavic

@adriangonz :)

geodavic avatar Dec 14 '23 21:12 geodavic

@sakoush if you don't mind taking a look, much appreciated!

geodavic avatar Dec 18 '23 15:12 geodavic

@adriangonz @sakoush

is there anything holding this back?

jyono avatar Jan 16 '24 17:01 jyono