Nicolas Patry comments

Results 978 comments of


                                            Nicolas Patry

WhisperTimeStampLogitsProcessor error while using Whisper pipelines. Was WhisperTimeStampLogitsProcessor used?

I'm using this audio https://github.com/frankiedrake/demo/blob/master/whisper_test.wav to test with your script.

WhisperTimeStampLogitsProcessor error while using Whisper pipelines. Was WhisperTimeStampLogitsProcessor used?

Thanks, I have been able to reproduce, defnitely linked to batching, as the thing works with `batch_size=1`. Working on a fix.

WhisperTimeStampLogitsProcessor error while using Whisper pipelines. Was WhisperTimeStampLogitsProcessor used?

Ok, the issue is that the model uses `50256` for padding, or silence. @ArthurZucker should we make this a special token ? (This would mean it would be ignored in...

WhisperTimeStampLogitsProcessor error while using Whisper pipelines. Was WhisperTimeStampLogitsProcessor used?

This is the issue: https://huggingface.co/openai/whisper-large-v2/blob/main/generation_config.json#L124 @melihogutcen A fix is coming.

WhisperTimeStampLogitsProcessor error while using Whisper pipelines. Was WhisperTimeStampLogitsProcessor used?

Proposed changes: https://huggingface.co/openai/whisper-base/discussions/12 https://huggingface.co/openai/whisper-large/discussions/29 https://huggingface.co/openai/whisper-medium/discussions/12 https://huggingface.co/openai/whisper-large-v2/discussions/30 https://huggingface.co/openai/whisper-small/discussions/19 https://huggingface.co/openai/whisper-tiny/discussions/9

WhisperTimeStampLogitsProcessor error while using Whisper pipelines. Was WhisperTimeStampLogitsProcessor used?

Thanks, any potential to see the files ? Or if you could print `previous_tokens` just before this error that would be nice. This error occurs when the state machine still...

WhisperTimeStampLogitsProcessor error while using Whisper pipelines. Was WhisperTimeStampLogitsProcessor used?

@melihogutcen This is Turkish, on `whisper-large-v2` correct ? I'll try to run a batch on some dataset to try and trigger it elsewhere. Still using the same script as above...

WhisperTimeStampLogitsProcessor error while using Whisper pipelines. Was WhisperTimeStampLogitsProcessor used?

@devxpy I have reproduced with your example. It seems this model never outputs timestamps. I am guessing it was finetuned without timestamps and so the error is kind of normal....

WhisperTimeStampLogitsProcessor error while using Whisper pipelines. Was WhisperTimeStampLogitsProcessor used?

https://github.com/huggingface/transformers/pull/22475/files

`clean_up_tokenization` too many false positives

> holy grail of original == decode(encode(original)) Bloom tokenizer achieves this if you're looking for it. To the exception that there's a very old default: https://github.com/huggingface/transformers/pull/20846 @ArthurZucker I feel really...