NeMo
NeMo copied to clipboard
`EncDecCTCModel.transcribe(audio=...)` changed to `EncDecCTCModel.transcribe(paths2audio_files=...)`
Description:
I updated NeMO to 1.23.0, and trying to use pretrained EncDecCTCModel.transcribe
.
In previous version I used to input audio tensors loaded using torchaudio. But now it asks for paths2audios_label
, when I input filepath, it doesn't transcribe the whole file but first 100000 datapoints. When I looked into Nvidia latest documents. There was no reference to paths2audio_files
but instead the argument was audio
which took tensor. How to get that functionality back to transcribe whole file.
Steps/Code to reproduce bug
import nemo.collections.asr as nemo_asr
import torch, torchaudio
wav_path = '1.wav'
asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(model_name="stt_hi_conformer_ctc_medium", map_location=torch.device('cuda'))
aud, sr = torchaudio.load(wav_path)
asr_model.transcribe(audio=aud)
Expected behavior We get the transcription when we give path in a list, since giving tensor, we are getting tensor can't we converted to JSON.
Environment overview (please complete the following information)
- Environment location: E2E network
- Method of NeMo install: Via PIP through
pip install "nemo-toolkit[all]"
Environment details
If NVIDIA docker image is used you don't need to specify these. Otherwise, please provide:
- OS version: Ubuntu 22.04
- PyTorch version: 2.2.0
- Python version: 3.10.14
Additional context
GPU Model: Tesla V100