mdx-net
mdx-net copied to clipboard
Encountered errors while executing training process #2
(Using Leaderboard_B) First I was stuck solving the environment and I let it sit for 30 min, but conda never finished creating the env from the yml. Because I was using a cloud instance, I didn't have time to wait and I did this instead:
conda create -n mdx-net
conda update conda
conda config --add channels conda-forge
conda activate mdx-net
sudo apt-get install soundstretch
python -m pip install -r requirements.txt
python src/utils/data_augmentation.py --data_dir /real/path/to/musdbhq/ --train True --test True
It seems that the model doesn't allow me to train it with songs that don't contain vocals.
python src/utils/data_augmentation.py --data_dir /home/ubuntu/mdx-files/musdb/ --train True --test True
10%|███████████████▉ | 11/114 [01:13<11:25, 6.65s/it]
Traceback (most recent call last):
File "src/utils/data_augmentation.py", line 111, in <module>
main(parser.parse_args())
File "src/utils/data_augmentation.py", line 30, in main
save_shifted_dataset(p, t, data_dir, 'train')
File "src/utils/data_augmentation.py", line 92, in save_shifted_dataset
source = load_wav(in_path.joinpath(s_name+'.wav'))
File "src/utils/data_augmentation.py", line 102, in load_wav
return sf.read(path, samplerate=sr, dtype='float32')[0].T
File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 256, in read
with SoundFile(file, 'r', samplerate, channels,
File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 629, in __init__
self._file = self._open(file, mode_int, closefd)
File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 1183, in _open
_error_check(_snd.sf_error(file_ptr),
File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening '/home/ubuntu/mdx-files/musdb/train/Artificial Intelligence - Native Instruments/vocals.wav': System error.
I deleted the songs that didn't contain vocals, then the data augmentation succeeded, but all attempts to train failed and I didn't have time to do debugging in the cloud GPU instance.
Here is the output from: python run.py experiment=multigpu_other model=ConvTDFNet_other
/usr/lib/python3/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /usr/lib/python3/dist-packages/torchvision/image.so: undefined symbol: _ZNK3c106IValue23reportToTensorTypeErrorEv
warn(f"Failed to load image Python extension: {e}")
Traceback (most recent call last):
File "run.py", line 7, in <module>
from pytorch_lightning.utilities import rank_zero_info
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/__init__.py", line 20, in <module>
from pytorch_lightning import metrics # noqa: E402
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/__init__.py", line 15, in <module>
from pytorch_lightning.metrics.classification import ( # noqa: F401
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/__init__.py", line 14, in <module>
from pytorch_lightning.metrics.classification.accuracy import Accuracy # noqa: F401
File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 16, in <module>
from torchmetrics import Accuracy as _Accuracy
File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/__init__.py", line 14, in <module>
from torchmetrics import functional # noqa: E402
File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/__init__.py", line 14, in <module>
from torchmetrics.functional.audio.pit import permutation_invariant_training, pit, pit_permutate
File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/audio/__init__.py", line 26, in <module>
from torchmetrics.functional.audio.pesq import perceptual_evaluation_speech_quality # noqa: F401
File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/audio/pesq.py", line 20, in <module>
import pesq as pesq_backend
File "/home/ubuntu/.local/lib/python3.8/site-packages/pesq/__init__.py", line 5, in <module>
from ._pesq import pesq, pesq_batch
File "/home/ubuntu/.local/lib/python3.8/site-packages/pesq/_pesq.py", line 8, in <module>
from .cypesq import cypesq, cypesq_retvals, cypesq_error_message as pesq_error_message
File "__init__.pxd", line 238, in init cypesq
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 80 from PyObject
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-PCI... On | 00000000:07:00.0 Off | 0 |
| N/A 35C P0 36W / 250W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-PCI... On | 00000000:08:00.0 Off | 0 |
| N/A 34C P0 33W / 250W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
@Ma5onic try pip install --upgrade numpy
Had the same issue. Fixed by installing old dependencies from around 2021. requirements.txt
@Ma5onic try pip install --upgrade numpy
@KimberleyJensen Thanks, but the newest version of numpy is incompatible.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
numba 0.56.2 requires setuptools<60, but you have setuptools 63.4.1 which is incompatible.
hydra-optuna-sweeper 1.1.0.dev2 requires numpy<1.20.0, but you have numpy 1.23.3 which is incompatible.
Maybe I should have used conda to update instead. Thanks anyways.
@Satisfy256 Ouhhh! interesting, okay I'll nuke my current install and start over lol.
Had the same issue. Fixed by installing old dependencies from around 2021. requirements.txt
@KimberleyJensen, you're onto something though, the current requirements.txt seems to also contain an issue related the one you mentioned here. The requirements.txt that @Satisfy256 mentioned has demucs<=2.0.3
listed as a dependency... that file might be a little hidden gem because I could not find it it the committed file history:
https://github.com/kuielab/mdx-net/commits/main/requirements.txt
Same with the leaderboard B branch/tree https://github.com/kuielab/mdx-net/commits/Leaderboard_B/requirements.txt
Still waiting for conda to solve the environment 😢
@Ma5onic I modified the requirements.txt to use old versions. I tested it out and it works for me in Ubuntu 20.04
@Satisfy256 okay, sick. That gives me hope, i'll start from scratch and try again.
@Ma5onic I modified the requirements.txt to use old versions. I tested it out and it works for me in Ubuntu 20.04
yay! it works!!! Thank you very much
Linux users with rtx cards, or anyone using a cloud instances will encounter dependency issues unrelated to the solution above. The pytorch landing page shows how the commands differ based on your OS/env