NeMo `EncDecCTCModel.transcribe(audio=...)` changed to `EncDecCTCModel.transcribe(paths2audio

`EncDecCTCModel.transcribe(audio=...)` changed to `EncDecCTCModel.transcribe(paths2audio_files=...)`

Open ambiSk opened this issue 9 months ago • 6 comments

Description: I updated NeMO to 1.23.0, and trying to use pretrained EncDecCTCModel.transcribe. In previous version I used to input audio tensors loaded using torchaudio. But now it asks for paths2audios_label, when I input filepath, it doesn't transcribe the whole file but first 100000 datapoints. When I looked into Nvidia latest documents. There was no reference to paths2audio_files but instead the argument was audio which took tensor. How to get that functionality back to transcribe whole file.

Steps/Code to reproduce bug

import nemo.collections.asr as nemo_asr
import torch, torchaudio
wav_path = '1.wav'
asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(model_name="stt_hi_conformer_ctc_medium", map_location=torch.device('cuda'))
aud, sr = torchaudio.load(wav_path)
asr_model.transcribe(audio=aud)

Expected behavior We get the transcription when we give path in a list, since giving tensor, we are getting tensor can't we converted to JSON.

Environment overview (please complete the following information)

Environment location: E2E network
Method of NeMo install: Via PIP through pip install "nemo-toolkit[all]"

Environment details

If NVIDIA docker image is used you don't need to specify these. Otherwise, please provide:

OS version: Ubuntu 22.04
PyTorch version: 2.2.0
Python version: 3.10.14

Additional context

GPU Model: Tesla V100

May 17 '24 08:05 ambiSk

NeMo NeMo copied to clipboard

`EncDecCTCModel.transcribe(audio=...)` changed to `EncDecCTCModel.transcribe(paths2audio_files=...)`

NeMo
NeMo copied to clipboard