nuwave icon indicating copy to clipboard operation
nuwave copied to clipboard

Python version for nuwave

Open eyacov opened this issue 1 year ago • 5 comments

Hey What version of python is this repo compatible with? It doesn't seem to work with python3.11

eyacov avatar Feb 25 '24 20:02 eyacov

Hy @eyacov!

I am not sure about the Python version compatibility. It was tested by Python 3.6 and Nvidia's docker image from [nvcr.io/nvidia/pytorch:20.09-py3].(https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)

Could you provide the error messages you're encountering?

junjun3518 avatar Mar 06 '24 05:03 junjun3518

Hey @junjun3518 I can't use this container because it's not available for my GPU: GeForce RTX 4090 when I try to run it without a container I'm getting the following error

Traceback (most recent call last): File "trainer.py", line 139, in <module> train(args) File "trainer.py", line 126, in train trainer.fit(model) File "/home/etay/anaconda3/envs/nuwave/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 510, in fit results = self.accelerator_backend.train() File "/home/etay/anaconda3/envs/nuwave/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 57, in train return self.train_or_test() File "/home/etay/anaconda3/envs/nuwave/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test results = self.trainer.train() File "/home/etay/anaconda3/envs/nuwave/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 532, in train self.run_sanity_check(self.get_model()) File "/home/etay/anaconda3/envs/nuwave/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 730, in run_sanity_check _, eval_results = self.run_evaluation(max_batches=self.num_sanity_val_batches) File "/home/etay/anaconda3/envs/nuwave/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 646, in run_evaluation output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx) File "/home/etay/anaconda3/envs/nuwave/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 180, in evaluation_step output = self.trainer.accelerator_backend.validation_step(args) File "/home/etay/anaconda3/envs/nuwave/lib/python3.6/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 73, in validation_step return self._step(self.trainer.model.validation_step, args) File "/home/etay/anaconda3/envs/nuwave/lib/python3.6/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 65, in _step output = model_step(*args) File "/media/etay/Daten/localization/nuwave/lightning_model_bird.py", line 239, in validation_step 0, self.max_step, (wav.shape[0], ), device=self.device) + 1 RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

eyacov avatar May 08 '24 16:05 eyacov

Hi @eyacov ,

It seems that error message is about cuda version issue.

Since this project started at 2020, it is utilizing old version of torch and cuda, which do not support 4090.

For now, I do not have any resource to test and do not have authorization to change this repo.

I recommend to use recent base image such as nvcr.io/nvidia/pytorch:23.07-py3 (I am using it for 4090 now).

junjun3518 avatar May 08 '24 23:05 junjun3518

hi @junjun3518

Trying to run the code this docker leads to the following error Traceback (most recent call last): File "/media/etay/Daten/localization/nuwave/trainer.py", line 1, in from lightning_model import NuWave File "/media/etay/Daten/localization/nuwave/lightning_model.py", line 9, in import pytorch_lightning as pl File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/init.py", line 66, in from pytorch_lightning import metrics File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/metrics/init.py", line 14, in from pytorch_lightning.metrics.metric import Metric File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/metrics/metric.py", line 23, in from pytorch_lightning.metrics.utils import _flatten, dim_zero_cat, dim_zero_mean, dim_zero_sum File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/metrics/utils.py", line 18, in from pytorch_lightning.utilities import rank_zero_warn File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/init.py", line 24, in from pytorch_lightning.utilities.apply_func import move_data_to_device File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/apply_func.py", line 25, in from torchtext.data import Batch ImportError: cannot import name 'Batch' from 'torchtext.data' (/usr/local/lib/python3.10/dist-packages/torchtext/data/init.py)

Do I need to change something in requirements.txt to make this work?

eyacov avatar May 14 '24 11:05 eyacov

I manged to solve the issue. The requirements file needs to be written as follows: ffmpeg torchtext==0.6.0 pytorch_lightning==1.1.6 prefetch_generator librosa==0.8.0 omegaconf==2.0.6

You might want to consider changing it

eyacov avatar May 21 '24 11:05 eyacov