pyannote-audio icon indicating copy to clipboard operation
pyannote-audio copied to clipboard

`ValueError: negative dimensions are not allowed` when tuning the optimizer.

Open sucv opened this issue 1 year ago • 10 comments

Hi, I got an error when I finetune a pipeline following the tutorial on my own dataset. The trainer.fit() was okay, but the error happened on for i, iteration in enumerate(iterations):.

I searched the history issue, but didn't find any fix (seems not this one. I tried to reduce the iteration to only 5 as if i > 5: break but the error still happens. Do you have any suggestion on pinpointing the cause? Is there any workaround available? Thanks!!

[W 2023-10-14 02:25:02,236] Trial 0 failed with value None.
Traceback (most recent call last):
  File "/mnt/e/Ubuntu/Speech_Diarization/main2.py", line 127, in <module>
    for i, iteration in enumerate(iterations):
  File "/home/su012/miniconda3/envs/sd/lib/python3.11/site-packages/pyannote/pipeline/optimizer.py", line 374, in tune_iter
    self.study_.optimize(objective, n_trials=1, timeout=None, n_jobs=1)
  File "/home/su012/miniconda3/envs/sd/lib/python3.11/site-packages/optuna/study/study.py", line 442, in optimize
    _optimize(
  File "/home/su012/miniconda3/envs/sd/lib/python3.11/site-packages/optuna/study/_optimize.py", line 66, in _optimize
    _optimize_sequential(
  File "/home/su012/miniconda3/envs/sd/lib/python3.11/site-packages/optuna/study/_optimize.py", line 163, in _optimize_sequential
    frozen_trial = _run_trial(study, func, catch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/su012/miniconda3/envs/sd/lib/python3.11/site-packages/optuna/study/_optimize.py", line 251, in _run_trial
    raise func_err
  File "/home/su012/miniconda3/envs/sd/lib/python3.11/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
                      ^^^^^^^^^^^
  File "/home/su012/miniconda3/envs/sd/lib/python3.11/site-packages/pyannote/pipeline/optimizer.py", line 227, in objective
    output = pipeline(input)
             ^^^^^^^^^^^^^^^
  File "/home/su012/miniconda3/envs/sd/lib/python3.11/site-packages/pyannote/audio/core/pipeline.py", line 325, in __call__
    return self.apply(file, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/su012/miniconda3/envs/sd/lib/python3.11/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 540, in apply
    discrete_diarization = self.reconstruct(
                           ^^^^^^^^^^^^^^^^^
  File "/home/su012/miniconda3/envs/sd/lib/python3.11/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 399, in reconstruct
    clustered_segmentations = np.NAN * np.zeros(
                                       ^^^^^^^^^
ValueError: negative dimensions are not allowed
 pipeline = pipe_SpeakerDiarization(
        segmentation=finetuned_model,
        clustering="OracleClustering",
    ).to(device)

# as reported in the technical report, min_duration_off can safely be set to 0.0
pipeline.freeze({"segmentation": {"min_duration_off": 0.0}})

optimizer = Optimizer(pipeline)
dev_set = list(dataset.development())

iterations = optimizer.tune_iter(dev_set, show_progress=True)
best_loss = 1.0

for i, iteration in enumerate(iterations):
    print(f"Best segmentation threshold so far: {iteration['params']['segmentation']['threshold']}")
    if i > 5: break  # 50 iterations should give slightly better results

sucv avatar Oct 14 '23 01:10 sucv

Thank you for your issue.You might want to check the FAQ if you haven't done so already.

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

  • installation
  • data preparation
  • model download
  • etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

Companies relying on pyannote.audio in production may contact me via email regarding:

  • paid scientific consulting around speaker diarization and speech processing in general;
  • custom models and tailored features (via the local tech transfer office).

This is an automated reply, generated by FAQtory

github-actions[bot] avatar Oct 14 '23 01:10 github-actions[bot]

One by one I checked the wav files in the development set, found some of them have no one speaking. Will remove those wavs and try again.

sucv avatar Oct 14 '23 08:10 sucv

What version of pyannote.audio are you using? This issue should be fixed in latest release (3.0.1).

hbredin avatar Oct 14 '23 09:10 hbredin

Thx for the prompt response. I downloaded the latest version and called diarization-3.0 pipeline.

On Sat, Oct 14, 2023 at 17:58 Hervé BREDIN @.***> wrote:

What version of pyannote.audio are you using? This issue should be fixed in latest release (3.0.1).

— Reply to this email directly, view it on GitHub https://github.com/pyannote/pyannote-audio/issues/1501#issuecomment-1762782759, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHBMSEDVV4MINAVOQDSFELLX7JO5NAVCNFSM6AAAAAA574OHH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRSG44DENZVHE . You are receiving this because you authored the thread.Message ID: @.***>

sucv avatar Oct 14 '23 10:10 sucv

What does this print?

from pyannote.audio import __version__
print(__version__)

hbredin avatar Oct 14 '23 10:10 hbredin

What does this print?

from pyannote.audio import __version__
print(__version__)

3.0.1

sucv avatar Oct 14 '23 12:10 sucv

I wonder why if np.nanmax(count.data) == 0.0: was passed and cause all -2 at hard_clusters[inactive_speakers] = -2 in speaker_diarization.py.

I set a breakpoint on hard_clusters[inactive_speakers] = -2 and find that the rttm (annotation) of the file lead to the issue contain only two rows. The file won't cause error when I directly feed it to a pipeline following the example in here.

As a workaround, I force the row number of each annotation file to be larger than 3. Will report you again soon.

# exit early when no speaker is ever active
if np.nanmax(count.data) == 0.0:
    diarization = Annotation(uri=file["uri"])
    if return_embeddings:
        return diarization, np.zeros((0, self._embedding.dimension))

    return diarization

# binarize segmentation
if self._segmentation.model.specifications.powerset:
    binarized_segmentations = segmentations
else:
    binarized_segmentations: SlidingWindowFeature = binarize(
        segmentations,
        onset=self.segmentation.threshold,
        initial_state=False,
    )

if self.klustering == "OracleClustering" and not return_embeddings:
    embeddings = None
else:
    embeddings = self.get_embeddings(
        file,
        binarized_segmentations,
        exclude_overlap=self.embedding_exclude_overlap,
        hook=hook,
    )
    hook("embeddings", embeddings)
    #   shape: (num_chunks, local_num_speakers, dimension)

hard_clusters, _, centroids = self.clustering(
    embeddings=embeddings,
    segmentations=binarized_segmentations,
    num_clusters=num_speakers,
    min_clusters=min_speakers,
    max_clusters=max_speakers,
    file=file,  # <== for oracle clustering
    frames=self._frames,  # <== for oracle clustering
)
# hard_clusters: (num_chunks, num_speakers)
# centroids: (num_speakers, dimension)

# reconstruct discrete diarization from raw hard clusters

# keep track of inactive speakers
inactive_speakers = np.sum(binarized_segmentations.data, axis=1) == 0
#   shape: (num_chunks, num_speakers)

hard_clusters[inactive_speakers] = -2
discrete_diarization = self.reconstruct(
    segmentations,
    hard_clusters,
    count,
)

sucv avatar Oct 14 '23 17:10 sucv

Can you please share the file and pipeline parameters that causes this issue?

hbredin avatar Oct 14 '23 18:10 hbredin

I packaged the minimum code and data.

To setup the environment, firstly, change the huggingface token in main.py, then do the following. (Note that the chance to incur the bug is not 100%. Please try running main.py for multiple times... during which the bug should happen.

conda create -n sd python=3.11
conda activate sd
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install pytorch-lightning
pip install pyannote.audio
python main.py

bug.zip

sucv avatar Oct 15 '23 02:10 sucv

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 13 '24 04:04 stale[bot]