silero-models
silero-models copied to clipboard
Bug report - [RunTime Error with latest Jit Spanish Model]
🐛 Bug
When following the examples of the colab_examples notebook, in the PyTorch Example/More Examples section, if use the latest Spanish jit model it gives the following Runtime error:
RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.
To Reproduce
Steps to reproduce the behavior:
- Open the Colab_examples.ipynb
- In the Pytorch Example/More Example sections, in the corresponding cell when loading the decoder and the model, change the English model to Spanish:
# model, decoder = init_jit_model(models.stt_models.en.latest.jit, device=device)
model, decoder = init_jit_model(models.stt_models.es.latest.jit, device=device)
- Keep running the notebook two more cells until the loop where the model is called. There is where the error shows up:
RuntimeError Traceback (most recent call last)
[<ipython-input-29-2c955b63e0bf>](https://localhost:8080/#) in <cell line: 4>()
2 input = prepare_model_input(read_batch(random.sample(batches, k=1)[0]),
3 device=device)
----> 4 output = model(input)
5 for example in output:
6 print(decoder(example.cpu()))
1 frames
[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *args, **kwargs)
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1528
1529 try:
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__/stt_pretrained/models/model.py", line 42, in forward
_4 = self.win_length
_5 = torch.hann_window(self.n_fft, dtype=ops.prim.dtype(x), layout=None, device=ops.prim.device(x), pin_memory=None)
x0 = __torch__.torch.functional.stft(x, _2, _3, _4, _5, True, "reflect", False, True, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_6 = torch.slice(x0, 0, 0, 9223372036854775807, 1)
_7 = torch.slice(_6, 1, 0, 9223372036854775807, 1)
File "code/__torch__/torch/functional.py", line 20, in stft
else:
input0 = input
_2 = torch.stft(input0, n_fft, hop_length, win_length, window, normalized, onesided)
~~~~~~~~~~ <--- HERE
return _2
Traceback of TorchScript, original code (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/functional.py", line 465, in stft
input = F.pad(input.view(extended_shape), (pad, pad), pad_mode)
input = input.view(input.shape[-signal_dim:])
return _VF.stft(input, n_fft, hop_length, win_length, window, normalized, onesided)
~~~~~~~~ <--- HERE
RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.
Expected behavior
The audio file should be transcribed to text
Environment
The environment of the colab_example notebook itself:
ollecting environment information... PyTorch version: 2.1.0+cu118 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: 14.0.0-1ubuntu1.1 CMake version: version 3.27.7 Libc version: glibc-2.35
Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-5.15.120+-x86_64-with-glibc2.35 Is CUDA available: False CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: N/A GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.6 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) CPU @ 2.20GHz CPU family: 6 Model: 79 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1 Stepping: 0 BogoMIPS: 4399.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities Hypervisor vendor: KVM Virtualization type: full L1d cache: 32 KiB (1 instance) L1i cache: 32 KiB (1 instance) L2 cache: 256 KiB (1 instance) L3 cache: 55 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0,1 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Vulnerable; SMT Host state unknown Vulnerability Meltdown: Vulnerable Vulnerability Mmio stale data: Vulnerable Vulnerability Retbleed: Vulnerable Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Vulnerable
Versions of relevant libraries: [pip3] numpy==1.23.5 [pip3] torch==2.1.0+cu118 [pip3] torchaudio==2.1.0+cu118 [pip3] torchdata==0.7.0 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.16.0 [pip3] torchvision==0.16.0+cu118 [pip3] triton==2.1.0 [conda] Could not collect
Looks like the Spanish model is too old.
Greetings. I'm facing the same problem here. I've managed to make the onnx Spanish model work, but I'd like to know if there's any way to use the jit model as it is now. Is there any previous version of torch that might be able to do the trick to actually run it? Thanks in advance for any response you might be able to provide.