MLServer
MLServer copied to clipboard
HuggingFace speech models not supported
trafficstars
MLserver HuggingFace runtime cannot work with speech models in the batched mode as the pipeline accepts a list of arrays [(request1), (request2), (request3), (request4), (request5)] which the type of each request is a NumPy array. However, MLServer stacked the NumPy data as an array of arrays of shape (batch_size, input_data) which will result in the following error when sending to the HuggingFace pipeline. It thinks the batched inputs are multi-channel inputs rather than batched single-channel inputs.
raise ValueError("We expect a single channel audio input for AutomaticSpeechRecognitionPipeline")
ValueError: We expect a single channel audio input for AutomaticSpeechRecognitionPipeline