audio librispeech_conformer_rnnt example gives error: "AttributeError: 'tuple' object has no attribute 'targets'"

librispeech_conformer_rnnt example gives error: "AttributeError: 'tuple' object has no attribute 'targets'"

Open kikofmas opened this issue 1 year ago • 1 comments

🐛 Describe the bug

Hi I am trying to run locally the librispeech_conformer_rnnt ASR example given here.

I installed the nightly versions of PyTorch and TorchAudio and the mentioned dependencies on the readme file. I then ran both train_spm.py and global_stats.py with the ouptput files located on the same directory as train.py.

I have a single machine with a single GPU (running on WSL2 - Ubuntu22.04) and I am trying to run a single epoch with the command:

python3 train.py --exp-dir ./experiments --librispeech-path ~/ --global-stats-path ./global_stats.json --sp-model-path ./spm_unigram_1023.model --epochs 1 --nodes 1 --gpus 1

I get the error "AttributeError: 'tuple' object has no attribute 'targets'". Below is the full terminal print.

kiko@Kiko-Legion:~/audio/examples/asr/librispeech_conformer_rnnt$ python train.py --exp-dir ./experiments --librispeech-path ~/ --global-stats-path ./global_stats.json --sp-model-path ./spm_unigram_1023.model --epochs 1 --nodes 1 --gpus 1

WARNING:2023-11-28 11:38:28 617:617 init.cpp:155] function cbapi->getCuptiStatus() failed with error CUPTI_ERROR_NOT_INITIALIZED (15)
WARNING:2023-11-28 11:38:28 617:617 init.cpp:156] CUPTI initialization failed - CUDA profiler activities will be missingINFO:2023-11-28 11:38:28 617:617 init.cpp:158] If you see CUPTI_ERROR_INSUFFICIENT_PRIVILEGES, refer to https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti
Seed set to 1
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[rank: 0] Seed set to 1
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

You are using a CUDA device ('NVIDIA GeForce RTX 3060 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
2023-11-28 11:38:40.493932: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-28 11:38:40.493999: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-28 11:38:40.494779: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-28 11:38:40.499513: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-28 11:38:41.243684: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type     | Params
-----------------------------------
0 | model | RNNT     | 30.2 M
1 | loss  | RNNTLoss | 0
-----------------------------------
30.2 M    Trainable params
0         Non-trainable params
30.2 M    Total params
120.937   Total estimated model params size (MB)
Sanity Checking: |                                                                                | 0/? [00:00<?, ?it/s]/home/kiko/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py:558: UserWarning: This DataLoader will create 10 worker processes in total. Our suggested max number of worker in current system is 8, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
Sanity Checking DataLoader 0:   0%|                                                               | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/kiko/audio/examples/asr/librispeech_conformer_rnnt/train.py", line 111, in <module>
    cli_main()
  File "/home/kiko/audio/examples/asr/librispeech_conformer_rnnt/train.py", line 107, in cli_main
    run_train(args)
  File "/home/kiko/audio/examples/asr/librispeech_conformer_rnnt/train.py", line 53, in run_train
    trainer.fit(model, data_module, ckpt_path=args.checkpoint_path)
  File "/home/kiko/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/home/kiko/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/home/kiko/.local/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 102, in launch
    return function(*args, **kwargs)
  File "/home/kiko/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/kiko/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run
    results = self._run_stage()
  File "/home/kiko/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1033, in _run_stage
    self._run_sanity_check()
  File "/home/kiko/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1062, in _run_sanity_check
    val_loop.run()
  File "/home/kiko/.local/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/home/kiko/.local/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 134, in run
    self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
  File "/home/kiko/.local/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 391, in _evaluation_step
    output = call._call_strategy_hook(trainer, hook_name, *step_args)
  File "/home/kiko/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/kiko/.local/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 402, in validation_step
    return self._forward_redirection(self.model, self.lightning_module, "validation_step", *args, **kwargs)
  File "/home/kiko/.local/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 633, in __call__
    wrapper_output = wrapper_module(*args, **kwargs)
  File "/home/kiko/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/kiko/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/kiko/.local/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
    else self._run_ddp_forward(*inputs, **kwargs)
  File "/home/kiko/.local/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward    return self.module(*inputs, **kwargs)  # type: ignore[index]
  File "/home/kiko/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/kiko/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/kiko/.local/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 626, in wrapped_forward
    out = method(*_args, **_kwargs)
  File "/home/kiko/audio/examples/asr/librispeech_conformer_rnnt/lightning.py", line 156, in validation_step
    return self._step(batch, batch_idx, "val")
  File "/home/kiko/audio/examples/asr/librispeech_conformer_rnnt/lightning.py", line 102, in _step
    prepended_targets = batch.targets.new_empty([batch.targets.size(0), batch.targets.size(1) + 1])
AttributeError: 'tuple' object has no attribute 'targets'

Am I trying to run the train.py wrongly? Or does it have to be run from slurm?

Thank you in advance for the help.

Versions

WARNING:2023-11-28 11:52:37 866:866 init.cpp:155] function cbapi->getCuptiStatus() failed with error CUPTI_ERROR_NOT_INITIALIZED (15) WARNING:2023-11-28 11:52:37 866:866 init.cpp:156] CUPTI initialization failed - CUDA profiler activities will be missing INFO:2023-11-28 11:52:37 866:866 init.cpp:158] If you see CUPTI_ERROR_INSUFFICIENT_PRIVILEGES, refer to https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti Collecting environment information... PyTorch version: 2.2.0.dev20231127+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.22.1 Libc version: glibc-2.35

Python version: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 12.3.103 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU Nvidia driver version: 546.17 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: AuthenticAMD Model name: AMD Ryzen 5 5600H with Radeon Graphics CPU family: 25 Model: 80 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping: 0 BogoMIPS: 6587.46 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload umip vaes vpclmulqdq rdpid fsrm Virtualization: AMD-V Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 128 KiB (4 instances) L1i cache: 128 KiB (4 instances) L2 cache: 2 MiB (4 instances) L3 cache: 16 MiB (1 instance) Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Mitigation; safe RET, no microcode Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Versions of relevant libraries: [pip3] numpy==1.26.2 [pip3] pytorch-lightning==2.1.2 [pip3] pytorch-triton==2.1.0+6e4932cda8 [pip3] torch==2.2.0.dev20231127+cu121 [pip3] torchaudio==2.2.0.dev20231127+cu121 [pip3] torchdata==0.7.1 [pip3] torchmetrics==1.2.0 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.17.0.dev20231127+cpu [pip3] torchvision==0.17.0.dev20231127+cu121 [pip3] triton==2.1.0 [conda] Could not collect

Nov 28 '23 11:11 kikofmas

I found the issue. On the file lightning.py we have:


#
# ...
#

    def _step(self, batch, _, step_type):
        if batch_in is None:
            return None

        prepended_targets = batch.targets.new_empty([batch.targets.size(0), batch.targets.size(1) + 1])

#
# ...
#

    def training_step(self, batch: Batch, batch_idx):
#
# ...
#
        loss = self._step(batch, batch_idx, "train")
        batch_size = batch.features.size(0)
#
# ...
#

Somewhere along the process, the variable batch is no longer of the type Batch, ence giving that error. I had to make a small change like:


#
# ...
#

    def _step(self, batch_in: Batch, _, step_type):
        if batch_in is None:
            return None

        batch = Batch(batch_in[0], batch_in[1], batch_in[2], batch_in[3])

        prepended_targets = batch.targets.new_empty([batch.targets.size(0), batch.targets.size(1) + 1])

#
# ...
#

    def training_step(self, batch_in: Batch, batch_idx):
#
# ...
#
        batch = Batch(batch_in[0], batch_in[1], batch_in[2], batch_in[3])
        loss = self._step(batch, batch_idx, "train")
        batch_size = batch.features.size(0)
#
# ...
#

I am sure that there is a better way to enforce/check but I am not particularly familiar with python nor why calling a function seems to transform the named tuple Batch to a simple tuple.

Nov 29 '23 17:11 kikofmas

audio audio copied to clipboard

librispeech_conformer_rnnt example gives error: "AttributeError: 'tuple' object has no attribute 'targets'"

🐛 Describe the bug

Versions

audio
audio copied to clipboard