whisper-jax
whisper-jax copied to clipboard
Words timestamps [HELP]
I'm not able to get the transcription with words timestamps. Only sentences timestamps.
If this possible with whisper-jax?
Thanks
Hey @RaulKite! Not yet - since this is a fairly new Whisper feature, we first need to add this into Hugging Face Transformers, and then propagate the changes on to Whisper JAX. Hoping to have these as soon as possible
First off kudos on this achievement- we use google speech and AWS both but this is just stellar performance!
- Quick question on the above - is there a timeline for the word time stamps? word time stamps are critical to play video with transcripts and all ASR systems provide it. We are considering using this in production replacing AWS but without word time stamps that might not be possible - if you can provide a timeline we would appreciate it a lot.
- And is there a way to detect uh, um in the speech - it will be highly beneficial for educational purposes.
Hey @anindyagupta - if anyone in the community would like to take a stab at adding word-level timestamps to 🤗 Transformers I'd be happy to guide the integration process and review PRs! Otherwise, I'm hoping to see to it by maybe next week. The full integration might take ~1.5-2 weeks?
This would be possible through prompting (an ongoing PR in 🤗 Transformers: https://github.com/huggingface/transformers/pull/22496). Again, once this is merged I'll propagate it on to Whisper JAX ASAP!
Yes I am interested, anyway this works well https://github.com/linto-ai/whisper-timestamped
I currently use ^ whisper-timestamped and am looking to migrate to Whisper-JAX because of its fantastic speed. Really appreciate that you've got this on the docket for this repo so quickly. Looking forward to seeing this get integrated.
Hi, has there been any update in this? It appears the above PR in :hugs: Transformers has been merged. It would be really useful in my own application for sub-second timestamps :)
@sanchit-gandhi is there any estimation for integrating both initial prompt and word timestamps?
Amazing project, super fast transcription, still missing this very important feature for word-by-word timestamps
Is there any news about word level timestamps?
Yes would also love to if it was integrated:)