transformers.js-examples icon indicating copy to clipboard operation
transformers.js-examples copied to clipboard

Whisper word level timestemp

Open apssouza22 opened this issue 8 months ago • 6 comments

I am wondering if it is possible to have word level timestamp when using the whisper example? I tried to include the param word_timestamps to the model options but it didn't work.

Any help in how to achieve this will be very welcome. Thanks

apssouza22 avatar Mar 08 '25 22:03 apssouza22

They don't appear to work with the distil models. the old xenova ones seem fine though less accurate for me than the sample for the distilled version here https://huggingface.co/spaces/Xenova/distil-whisper-web but it only gives sentence timestamps

danieloi avatar May 01 '25 13:05 danieloi

@danieloi I managed to do it here https://github.com/apssouza22/video-text-edit

apssouza22 avatar May 02 '25 08:05 apssouza22

@apssouza22 it works for word-level timestamps when you use the distil variants of the model?

danieloi avatar May 02 '25 09:05 danieloi

Oh sorry, distilled didn't work, but I thought it was because the models are too big. @danieloi

apssouza22 avatar May 02 '25 09:05 apssouza22

Same here @apssouza22, I get this error when I want word-level timestamps and use the distil variants in transformers.js:

Error: Layer index 6 is out of bounds for cross attentions (length 4). at webpack://@huggingface/transformers/./src/models.js:3498:1 at Array.map () at Function._extract_token_timestamps (webpack://@huggingface/transformers/./src/models.js:3496:30) at Function.generate (webpack://@huggingface/transformers/./src/models.js:3442:1) at async Function._call_whisper (webpack://@huggingface/transformers/./src/pipelines.js:1867:1) ... message: 'Layer index 6 is out of bounds for cross attentions (length 4).'}

danieloi avatar May 02 '25 09:05 danieloi

@danieloi yeah. Got the same. Let me know if you manage to get it sorted

apssouza22 avatar May 02 '25 09:05 apssouza22