mdx-net icon indicating copy to clipboard operation
mdx-net copied to clipboard

Encountered errors while executing training process #2

Open Ma5onic opened this issue 2 years ago • 1 comments

(Using Leaderboard_B) First I was stuck solving the environment and I let it sit for 30 min, but conda never finished creating the env from the yml. Because I was using a cloud instance, I didn't have time to wait and I did this instead:

conda create -n mdx-net
conda update conda
conda config --add channels conda-forge
conda activate mdx-net
sudo apt-get install soundstretch
python -m pip install -r requirements.txt
python src/utils/data_augmentation.py --data_dir /real/path/to/musdbhq/ --train True --test True

It seems that the model doesn't allow me to train it with songs that don't contain vocals.

python src/utils/data_augmentation.py --data_dir /home/ubuntu/mdx-files/musdb/ --train True --test True
 10%|███████████████▉                                                                                                                                                     | 11/114 [01:13<11:25,  6.65s/it]
Traceback (most recent call last):
  File "src/utils/data_augmentation.py", line 111, in <module>
    main(parser.parse_args())
  File "src/utils/data_augmentation.py", line 30, in main
    save_shifted_dataset(p, t, data_dir, 'train')
  File "src/utils/data_augmentation.py", line 92, in save_shifted_dataset
    source = load_wav(in_path.joinpath(s_name+'.wav'))
  File "src/utils/data_augmentation.py", line 102, in load_wav
    return sf.read(path, samplerate=sr, dtype='float32')[0].T
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 256, in read
    with SoundFile(file, 'r', samplerate, channels,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 629, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 1183, in _open
    _error_check(_snd.sf_error(file_ptr),
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 1357, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening '/home/ubuntu/mdx-files/musdb/train/Artificial Intelligence - Native Instruments/vocals.wav': System error.

I deleted the songs that didn't contain vocals, then the data augmentation succeeded, but all attempts to train failed and I didn't have time to do debugging in the cloud GPU instance.

Here is the output from: python run.py experiment=multigpu_other model=ConvTDFNet_other

/usr/lib/python3/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /usr/lib/python3/dist-packages/torchvision/image.so: undefined symbol: _ZNK3c106IValue23reportToTensorTypeErrorEv
  warn(f"Failed to load image Python extension: {e}")
Traceback (most recent call last):
  File "run.py", line 7, in <module>
    from pytorch_lightning.utilities import rank_zero_info
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/__init__.py", line 20, in <module>
    from pytorch_lightning import metrics  # noqa: E402
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/__init__.py", line 15, in <module>
    from pytorch_lightning.metrics.classification import (  # noqa: F401
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/__init__.py", line 14, in <module>
    from pytorch_lightning.metrics.classification.accuracy import Accuracy  # noqa: F401
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 16, in <module>
    from torchmetrics import Accuracy as _Accuracy
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/__init__.py", line 14, in <module>
    from torchmetrics import functional  # noqa: E402
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/__init__.py", line 14, in <module>
    from torchmetrics.functional.audio.pit import permutation_invariant_training, pit, pit_permutate
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/audio/__init__.py", line 26, in <module>
    from torchmetrics.functional.audio.pesq import perceptual_evaluation_speech_quality  # noqa: F401
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/audio/pesq.py", line 20, in <module>
    import pesq as pesq_backend
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pesq/__init__.py", line 5, in <module>
    from ._pesq import pesq, pesq_batch
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pesq/_pesq.py", line 8, in <module>
    from .cypesq import cypesq, cypesq_retvals, cypesq_error_message as pesq_error_message
  File "__init__.pxd", line 238, in init cypesq
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 80 from PyObject

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   35C    P0    36W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCI...  On   | 00000000:08:00.0 Off |                    0 |
| N/A   34C    P0    33W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

Ma5onic avatar Aug 22 '22 07:08 Ma5onic

@Ma5onic try pip install --upgrade numpy

KimberleyJensen avatar Aug 22 '22 09:08 KimberleyJensen

Had the same issue. Fixed by installing old dependencies from around 2021. requirements.txt

Satisfy256 avatar Sep 16 '22 16:09 Satisfy256

@Ma5onic try pip install --upgrade numpy

@KimberleyJensen Thanks, but the newest version of numpy is incompatible.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
numba 0.56.2 requires setuptools<60, but you have setuptools 63.4.1 which is incompatible.
hydra-optuna-sweeper 1.1.0.dev2 requires numpy<1.20.0, but you have numpy 1.23.3 which is incompatible.

Maybe I should have used conda to update instead. Thanks anyways.

@Satisfy256 Ouhhh! interesting, okay I'll nuke my current install and start over lol.

Had the same issue. Fixed by installing old dependencies from around 2021. requirements.txt

Ma5onic avatar Sep 24 '22 11:09 Ma5onic

@KimberleyJensen, you're onto something though, the current requirements.txt seems to also contain an issue related the one you mentioned here. The requirements.txt that @Satisfy256 mentioned has demucs<=2.0.3 listed as a dependency... that file might be a little hidden gem because I could not find it it the committed file history: https://github.com/kuielab/mdx-net/commits/main/requirements.txt Same with the leaderboard B branch/tree https://github.com/kuielab/mdx-net/commits/Leaderboard_B/requirements.txt

Still waiting for conda to solve the environment 😢

Ma5onic avatar Sep 25 '22 00:09 Ma5onic

@Ma5onic I modified the requirements.txt to use old versions. I tested it out and it works for me in Ubuntu 20.04

Satisfy256 avatar Sep 25 '22 07:09 Satisfy256

@Satisfy256 okay, sick. That gives me hope, i'll start from scratch and try again.

@Ma5onic I modified the requirements.txt to use old versions. I tested it out and it works for me in Ubuntu 20.04

Ma5onic avatar Sep 29 '22 20:09 Ma5onic

yay! it works!!! Thank you very much

Ma5onic avatar Sep 30 '22 06:09 Ma5onic

Linux users with rtx cards, or anyone using a cloud instances will encounter dependency issues unrelated to the solution above. The pytorch landing page shows how the commands differ based on your OS/env

Ma5onic avatar Oct 01 '22 02:10 Ma5onic