so-vits-svc-fork icon indicating copy to clipboard operation
so-vits-svc-fork copied to clipboard

"svc pre-sd" - Pyannote.audio does not work or is not accessible

Open hornedpariah opened this issue 2 years ago • 5 comments

"svc pre-sd to split the dataset into multiple files (using pyannote.audio"

attempting to install pyannote.audio breaks everything. Even then, once installed theres a compatibility issue along with a huggingface repository issue. I agreed to the terms and was still not able to access the repository. i made a new token afterwards with the same denial of access.

hornedpariah avatar Apr 23 '23 18:04 hornedpariah

Seems like there is some issue with the dependencies version used by pyannote

Qualzz avatar Apr 23 '23 19:04 Qualzz

I had the same problem. I solved it like this: (NOTE: see the EDIT)

Activate your venv for SVC: source venv/bin/activate Then, git clone https://github.com/pyannote/pyannote-audio

Move the pyannote-audio where you want.

cd pyannote-audio

Modify pyannote-audio/requirements.txt like this:

Remove everything except:

asteroid-filterbanks >=0.4
einops >=0.6.0
pyannote.core >= 5.0.0
pyannote.database >= 5.0.1
pyannote.metrics >= 3.2
pyannote.pipeline >= 2.3   # 2.4
pytorch_metric_learning >= 2.1.0
speechbrain >= 0.5.14
torch_audiomentations >= 0.11.0
semver >= 3.0.0

Then,

pip install -r requirements.txt
python setup.py install

And you should be good to go.

NOTE: svc pre-sd complains about:

Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu118. Bad things might happen unless you revert torch to 1.x.

And it suggests something like:

 Lightning automatically upgraded your loaded checkpoint     utils.py:126
                    from v1.5.4 to v2.0.1.post0. To apply the upgrade to your files                    
                    permanently, run `python -m                                                        
                    pytorch_lightning.utilities.upgrade_checkpoint --file                              
                    /home/steph/.cache/torch/pyannote/models--pyannote             
                    --segmentation/snapshots/c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b/pyto             
                    rch_model.bin`   

So I did:

python -m pytorch_lightning.utilities.upgrade_checkpoint /home/steph/.cache/torch/pyannote/models--pyannote--segmentation/snapshots/c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b/pytorch_model.bin

In fact, the warning stays... But as far as I can see, it seems to work perfectly.

EDIT: It clearly works, but it doesn't make use of the GPU despite the fact that svc pre-sd --help message seems to imply that it will use VRAM. Is it a consequence of:

Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu118. Bad things might happen unless you revert torch to 1.x.

sbersier avatar May 07 '23 12:05 sbersier

I found a solution to make pyannote use the gpu.

Add pipeline = pipeline.to(0) in preprocessing/preprocess_speaker_diarization.py before line 38

if pipeline is None:
    raise ValueError("Failed to load pipeline")
pipeline = pipeline.to(0)
LOG.info(f"Processing {input_path}. This may take a while...")
diarization = pipeline(
    input_path, min_speakers=min_speakers, max_speakers=max_speakers
)

Knartzpirat avatar May 11 '23 16:05 Knartzpirat

@Knartzpirat : Yep! It works for me. Well done! Tested with 56 min. of 44 kHz wav audio. The GPU is used. No crash.

sbersier avatar May 11 '23 17:05 sbersier

@Knartzpirat The commit pyannote/pyannote-audio@9d500dc breaks your fix as it doesn't expect to receive an int as the device.

Replacing pipeline.to(0) with pipeline.to(torch.device("cuda")) makes it properly use the GPU again:

if pipeline is None:
    raise ValueError("Failed to load pipeline")
pipeline = pipeline.to(torch.device("cuda"))
LOG.info(f"Processing {input_path}. This may take a while...")
diarization = pipeline(
    input_path, min_speakers=min_speakers, max_speakers=max_speakers
)

AmazingSlab avatar Jun 03 '23 21:06 AmazingSlab