[New Model] Parakeet-tdt-0.6b-v2 by NVIDIA
Parakeet-tdt-0.6b-v2 by NVIDIA is a 600-million-parameter automatic speech recognition (ASR) model designed for high-quality English transcription, featuring support for punctuation, capitalization, and accurate timestamp prediction.
Less than 3GB in size. It can output both normal text and SRT subtitles. I think it will be a great addition to dsnote's STT model choices for English language.
Link: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2
Hi, thanks for the information. NVIDIA NeMo engine support would be needed to enable this model. I will investigate what can be done.
It would be awesome 🙌🏻
@mkiol just wanted to revive this to let you know there's an onnx-asr library that let you load parakeet:
https://github.com/istupakov/onnx-asr
There has been an update to the model. Updated from v2 to v3. Link to updated model: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3