pyannote-audio icon indicating copy to clipboard operation
pyannote-audio copied to clipboard

numpy.ndarray audio input doesn't work?

Open Purfview opened this issue 10 months ago • 3 comments

Tested versions

pyannote.audio==3.1.1

System information

Windows / CPU

Issue description

If the wrong type is passed to a pipeline you get this error message:

ValueError: 
Audio files can be provided to the Audio class using different types:
    - a "str" or "Path" instance: "audio.wav" or Path("audio.wav")
    - a "IOBase" instance with "read" and "seek" support: open("audio.wav", "rb")
    - a "Mapping" with any of the above as "audio" key: {"audio": ...}
    - a "Mapping" with both "waveform" and "sample_rate" key:
        {"waveform": (channel, time) numpy.ndarray or torch.Tensor, "sample_rate": 44100}

It says above that it supports numpy.ndarray,

Test:

from pyannote.audio import Model
model = Model.from_pretrained(
  "pyannote/segmentation-3.0", 
  use_auth_token="removed")

# Generate dummy audio:
import numpy as np
audio_data = np.sin(2 * np.pi * 440 * np.linspace(0, 60, 60*16000)).astype(np.float32) / 32768.0
# Reshape to "(channel, time)":
audio_data = audio_data.reshape(1, -1)

audio_data = {"waveform": audio_data, "sample_rate": 16000}

from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation=model)
HYPER_PARAMETERS = {"min_duration_on": 0.0, "min_duration_off": 0.0}
pipeline.instantiate(HYPER_PARAMETERS)
timecodes = pipeline(audio_data)
print(timecodes)

Error:

    timecodes = pipeline(audio_data)
  File "D:\Programs\Python64\lib\site-packages\pyannote\audio\core\pipeline.py", line 325, in __call__
    return self.apply(file, **kwargs)
  File "D:\Programs\Python64\lib\site-packages\pyannote\audio\pipelines\voice_activity_detection.py", line 211, in apply
    segmentations: SlidingWindowFeature = self._segmentation(
  File "D:\Programs\Python64\lib\site-packages\pyannote\audio\core\inference.py", line 425, in __call__
    return self.slide(waveform, sample_rate, hook=hook)
  File "D:\Programs\Python64\lib\site-packages\pyannote\audio\core\inference.py", line 281, in slide
    waveform.unfold(1, window_size, step_size),
AttributeError: 'numpy.ndarray' object has no attribute 'unfold'

Purfview avatar Apr 18 '24 23:04 Purfview

Thanks for the bug report.

Would you mind opening a PR removing the mention of numpy arrays in the error message?

hbredin avatar Apr 19 '24 06:04 hbredin

Offtopic question: Is it possible to get VAD results faster, for example pyannote-onnx implementation is ~5 times faster for me?

Would you mind opening a PR removing the mention of numpy arrays in the error message?

Done

Purfview avatar Apr 19 '24 15:04 Purfview

Thanks for the PR.

Please open a new issue/discussion for your other question.

hbredin avatar Apr 22 '24 14:04 hbredin