model_server icon indicating copy to clipboard operation
model_server copied to clipboard

Whisper model deployment with OVMS

Open Aditya-Scalers opened this issue 1 year ago • 2 comments

With the reference of whisper implementation with openvino for subtitle generation, I was able to create the whisper_encoder and whisper_decoder xml and bin files. Using whisper_encoder and whisper_decoder as seperate models with ovms i was able to start the docker container.

New status: ( "state": "AVAILABLE", "error_code": "OK" ) [2023-09-26 13:14:16.673][1][serving][info][model.cpp:88] Updating default version for model: whisper, from: 0 [2023-09-26 13:14:16.673][1][serving][info][model.cpp:98] Updated default version for model: whisper, to: 1 [2023-09-26 13:14:16.673][66][modelmanager][info][modelmanager.cpp:1069] Started model manager thread [2023-09-26 13:14:16.673][1][serving][info][servablemanagermodule.cpp:45] ServableManagerModule started [2023-09-26 13:14:16.673][67][modelmanager][info][modelmanager.cpp:1088] Started cleaner thread

I am not able to perform inference on these models. Any help would be appreciated.

client code:

from ovmsclient import make_grpc_client

client = make_grpc_client("localhost:9000")
binary_data = None
with open("output.bin", "rb") as binary_file:
    binary_data = binary_file.read()
data_dict = {
    "binary_data": binary_data
}
results = client.predict(inputs=data_dict, model_name="whisper")

When i request from the client using binary input of audio file i am getting this error.

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ovmsclient/tfs_compat/grpc/serving_client.py", line 47, in predict
    raw_response = self.prediction_service_stub.Predict(request.raw_request, timeout)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/grpc/_channel.py", line 1161, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/grpc/_channel.py", line 1004, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INVALID_ARGUMENT
        details = "Invalid number of inputs - Expected: 26; Actual: 1"
        debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:9000 {grpc_message:"Invalid number of inputs - Expected: 26; Actual: 1", grpc_status:3, created_time:"2023-09-27T07:31:16.658787286+00:00"}"
>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/cloudvedge/client.py", line 12, in <module>
    results = client.predict(inputs=data_dict, model_name="whisper")
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ovmsclient/tfs_compat/grpc/serving_client.py", line 49, in predict
    raise_from_grpc(grpc_error)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ovmsclient/tfs_compat/base/errors.py", line 66, in raise_from_grpc
    raise (error_class(details))
ovmsclient.tfs_compat.base.errors.InvalidInputError: Error occurred during handling the request: Invalid number of inputs - Expected: 26; Actual: 1

Aditya-Scalers avatar Sep 26 '23 13:09 Aditya-Scalers

Hi, you are getting error that OVMS expects more inputs than you provide. I assume that you use some kind of wrapper for OV model that encapsulates the fact that OV uses much more inputs than one to perform inference.

We have plans to support adding python code execution support inside OVMS so that could ease the integration in cases when you have existing python wrapping.

One thing I noticed as well is that you tried to use binary audi file - right now OVMS only supports images with binary inputs.

atobiszei avatar Oct 20 '23 10:10 atobiszei

Hi,

Reference to this link. I also exported whisper encode & decode model to IR format. Then I tested it successfully with some class in this link

Next steps, How can I use these models in IR format for OPVM loading it and then I can use client to inference it ?

Please advise this.

nhha1602 avatar Nov 17 '23 09:11 nhha1602