MLServer Fix mlserver_huggingface settings device type

Fix mlserver_huggingface settings device type

Open geodavic opened this issue 2 years ago • 5 comments

trafficstars

If using a model-settings.json of the following form:

{
    "name": "my-model",
    "implementation": "mlserver_huggingface.HuggingFaceRuntime",
    "parameters": {
        "extra": {
            "task": "text-generation",
            "pretrained_model": "model/path",
            "model_kwargs": {
                "load_in_8bit": true
            }
        }
    }
}

an error occurs when spinning up the server:

The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object.

because the load_in_8bit model kwarg makes forces the model to be loaded with Accelerate, and HuggingFaceSettings.device default value of -1 is passed. I tried simply passing "device": null to the model settings json to prevent this default value, but this wasn't accepted because the schema is typed to int.

The solution is to expand the typing of HuggingFaceSettings.device to accept null (and string, while we're at it). Therefore I have updated the typing to Optional[Union[int,str]], which is very close to the type hint used in the underlying transformers pipeline.

I have tested the above model settings json with this change and the server starts as expected.

Nov 15 '23 19:11 geodavic

@adriangonz, please take a look when you have a chance. It's a small change

Nov 15 '23 19:11 nanbo-liu

@adriangonz Thanks! I added a test that passes None, -1 and cpu for the device to ensure they all load into cpu.

Nov 27 '23 17:11 geodavic

@adriangonz :)

Dec 14 '23 21:12 geodavic

@sakoush if you don't mind taking a look, much appreciated!

Dec 18 '23 15:12 geodavic

@adriangonz @sakoush

is there anything holding this back?

Jan 16 '24 17:01 jyono

MLServer MLServer copied to clipboard

Fix mlserver_huggingface settings device type

MLServer
MLServer copied to clipboard