pyannote-audio
pyannote-audio copied to clipboard
numpy.ndarray audio input doesn't work?
Tested versions
pyannote.audio==3.1.1
System information
Windows / CPU
Issue description
If the wrong type is passed to a pipeline you get this error message:
ValueError:
Audio files can be provided to the Audio class using different types:
- a "str" or "Path" instance: "audio.wav" or Path("audio.wav")
- a "IOBase" instance with "read" and "seek" support: open("audio.wav", "rb")
- a "Mapping" with any of the above as "audio" key: {"audio": ...}
- a "Mapping" with both "waveform" and "sample_rate" key:
{"waveform": (channel, time) numpy.ndarray or torch.Tensor, "sample_rate": 44100}
It says above that it supports numpy.ndarray,
Test:
from pyannote.audio import Model
model = Model.from_pretrained(
"pyannote/segmentation-3.0",
use_auth_token="removed")
# Generate dummy audio:
import numpy as np
audio_data = np.sin(2 * np.pi * 440 * np.linspace(0, 60, 60*16000)).astype(np.float32) / 32768.0
# Reshape to "(channel, time)":
audio_data = audio_data.reshape(1, -1)
audio_data = {"waveform": audio_data, "sample_rate": 16000}
from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation=model)
HYPER_PARAMETERS = {"min_duration_on": 0.0, "min_duration_off": 0.0}
pipeline.instantiate(HYPER_PARAMETERS)
timecodes = pipeline(audio_data)
print(timecodes)
Error:
timecodes = pipeline(audio_data)
File "D:\Programs\Python64\lib\site-packages\pyannote\audio\core\pipeline.py", line 325, in __call__
return self.apply(file, **kwargs)
File "D:\Programs\Python64\lib\site-packages\pyannote\audio\pipelines\voice_activity_detection.py", line 211, in apply
segmentations: SlidingWindowFeature = self._segmentation(
File "D:\Programs\Python64\lib\site-packages\pyannote\audio\core\inference.py", line 425, in __call__
return self.slide(waveform, sample_rate, hook=hook)
File "D:\Programs\Python64\lib\site-packages\pyannote\audio\core\inference.py", line 281, in slide
waveform.unfold(1, window_size, step_size),
AttributeError: 'numpy.ndarray' object has no attribute 'unfold'
Thanks for the bug report.
Would you mind opening a PR removing the mention of numpy arrays in the error message?
Offtopic question: Is it possible to get VAD results faster, for example pyannote-onnx implementation is ~5 times faster for me?
Would you mind opening a PR removing the mention of numpy arrays in the error message?
Done
Thanks for the PR.
Please open a new issue/discussion for your other question.