MLServer icon indicating copy to clipboard operation
MLServer copied to clipboard

HuggingFace speech models not supported

Open saeid93 opened this issue 3 years ago • 0 comments
trafficstars

MLserver HuggingFace runtime cannot work with speech models in the batched mode as the pipeline accepts a list of arrays [(request1), (request2), (request3), (request4), (request5)] which the type of each request is a NumPy array. However, MLServer stacked the NumPy data as an array of arrays of shape (batch_size, input_data) which will result in the following error when sending to the HuggingFace pipeline. It thinks the batched inputs are multi-channel inputs rather than batched single-channel inputs.

raise ValueError("We expect a single channel audio input for AutomaticSpeechRecognitionPipeline")
ValueError: We expect a single channel audio input for AutomaticSpeechRecognitionPipeline

saeid93 avatar Oct 01 '22 15:10 saeid93