transformers
transformers copied to clipboard
No ```timestamps``` from ```TFWhisperForConditionalGeneration``` with ```predict_timestamps=True```
System Info
Latest Version. On google Colab
Who can help?
@sanchit-gandhi @connor-henderson
Information
I am trying to convert tensorflow whisper to tflite but turns out that TFWhisper doesnt want to output timestamp tokens.
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
import tensorflow as tf
# Importing necessary classes from transformers
from transformers import WhisperProcessor, WhisperFeatureExtractor, TFWhisperForConditionalGeneration, WhisperTokenizer
# Importing necessary functions from datasets
from datasets import load_dataset
# Creating an instance of AutoProcessor from the pretrained model
feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-tiny.en")
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-tiny.en", predict_timestamps=True)
processor = WhisperProcessor(feature_extractor, tokenizer)
# Creating an instance of TFWhisperForConditionalGeneration from the pretrained model
model = TFWhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en")
# Loading dataset
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
# Inputs
inputs = processor(ds[0]["audio"]["array"], return_tensors="tf")
input_features = inputs.input_features
# Generating Transcription
generated_ids = model.generate(input_features=input_features, return_timestamps=True)
transcription = processor.tokenizer.decode(generated_ids[0], decode_with_timestamps=True)
print(transcription)
<|startoftranscript|><|notimestamps|> Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.<|endoftext|>
Expected behavior
While the same tokenizer with predict_timestamps=True
works as expected in pytorch:
import torch
from transformers import WhisperForConditionalGeneration
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en")
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
inputs = processor(ds[0]["audio"]["array"], return_tensors="pt")
input_features = inputs.input_features
generated_ids = model.generate(inputs=input_features, return_timestamps=True)
transcription = processor.tokenizer.decode(generated_ids[0], decode_with_timestamps=True)
transcription
<|startoftranscript|><|0.00|> Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.<|5.44|><|endoftext|>
Will be closed by https://github.com/huggingface/transformers/pull/21334
@ArthurZucker do you maybe have some time to finish https://github.com/huggingface/transformers/pull/21334? Alternatively we can open this one up to the community if not!
Yep let's open it to the community I'm a bit short on time 😓
If you come across this feature request and are interested in having a go, that's awesome, it's great to see! Feel free to resume the PR @ArthurZucker started at #21334 - it provides the scaffold you need to add this feature!
Sure, I’ll dig it up and share tomorrow … it is only transcribing for now but with timestamps.
@nyadla-sys https://colab.research.google.com/drive/1qXcgILcA-HPEYqAYPrxQQ1TRwXerErDk?usp=sharing
Hey @nyadla-sys - cool to see that you're using the TF model for inference! Could I respectfully ask that we try and keep the GitHub issue thread relevant to the issue being discussed? For other TF / TFLite issues, you can either open a new issue or open a post on the forum: https://discuss.huggingface.co
Thanks!
Hey @nyadla-sys - cool to see that you're using the TF model for inference! Could I respectfully ask that we try and keep the GitHub issue thread relevant to the issue being discussed? For other TF / TFLite issues, you can either open a new issue or open a post on the forum: https://discuss.huggingface.co
Thanks!
removed my comments ,sorry to spam
May I please try to tackle this with @0525hhgus on the weekends? Thank you and I hope you all have a great weekend!
Of course! Feel free to continue @ArthurZucker's PR https://github.com/huggingface/transformers/pull/21334 - it's already in a good state regarding TF Whisper timestamps. You just need to do the TF Whisper part, the Flax part has been merged already :)
Alternatively you can open a new PR and copy across the relevant code changes if you're more comfortable doing that. Feel free to tag myself and Arthur in any PR you work on - we're on hand to help with questions / queries!
@makaveli10 Did you able to solve the word timestamp on whisper tflite , I am also on the same page,