so-vits-svc-fork
so-vits-svc-fork copied to clipboard
"svc pre-sd" - Pyannote.audio does not work or is not accessible
"svc pre-sd to split the dataset into multiple files (using pyannote.audio"
attempting to install pyannote.audio breaks everything. Even then, once installed theres a compatibility issue along with a huggingface repository issue. I agreed to the terms and was still not able to access the repository. i made a new token afterwards with the same denial of access.
Seems like there is some issue with the dependencies version used by pyannote
I had the same problem. I solved it like this: (NOTE: see the EDIT)
Activate your venv for SVC:
source venv/bin/activate
Then,
git clone https://github.com/pyannote/pyannote-audio
Move the pyannote-audio where you want.
cd pyannote-audio
Modify pyannote-audio/requirements.txt like this:
Remove everything except:
asteroid-filterbanks >=0.4
einops >=0.6.0
pyannote.core >= 5.0.0
pyannote.database >= 5.0.1
pyannote.metrics >= 3.2
pyannote.pipeline >= 2.3 # 2.4
pytorch_metric_learning >= 2.1.0
speechbrain >= 0.5.14
torch_audiomentations >= 0.11.0
semver >= 3.0.0
Then,
pip install -r requirements.txt
python setup.py install
And you should be good to go.
NOTE:
svc pre-sd complains about:
Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu118. Bad things might happen unless you revert torch to 1.x.
And it suggests something like:
Lightning automatically upgraded your loaded checkpoint utils.py:126
from v1.5.4 to v2.0.1.post0. To apply the upgrade to your files
permanently, run `python -m
pytorch_lightning.utilities.upgrade_checkpoint --file
/home/steph/.cache/torch/pyannote/models--pyannote
--segmentation/snapshots/c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b/pyto
rch_model.bin`
So I did:
python -m pytorch_lightning.utilities.upgrade_checkpoint /home/steph/.cache/torch/pyannote/models--pyannote--segmentation/snapshots/c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b/pytorch_model.bin
In fact, the warning stays... But as far as I can see, it seems to work perfectly.
EDIT: It clearly works, but it doesn't make use of the GPU despite the fact that svc pre-sd --help message seems to imply that it will use VRAM. Is it a consequence of:
Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu118. Bad things might happen unless you revert torch to 1.x.
I found a solution to make pyannote use the gpu.
Add pipeline = pipeline.to(0) in preprocessing/preprocess_speaker_diarization.py before line 38
if pipeline is None:
raise ValueError("Failed to load pipeline")
pipeline = pipeline.to(0)
LOG.info(f"Processing {input_path}. This may take a while...")
diarization = pipeline(
input_path, min_speakers=min_speakers, max_speakers=max_speakers
)
@Knartzpirat : Yep! It works for me. Well done! Tested with 56 min. of 44 kHz wav audio. The GPU is used. No crash.
@Knartzpirat The commit pyannote/pyannote-audio@9d500dc breaks your fix as it doesn't expect to receive an int as the device.
Replacing pipeline.to(0) with pipeline.to(torch.device("cuda")) makes it properly use the GPU again:
if pipeline is None:
raise ValueError("Failed to load pipeline")
pipeline = pipeline.to(torch.device("cuda"))
LOG.info(f"Processing {input_path}. This may take a while...")
diarization = pipeline(
input_path, min_speakers=min_speakers, max_speakers=max_speakers
)