transformers icon indicating copy to clipboard operation
transformers copied to clipboard

No ```timestamps``` from ```TFWhisperForConditionalGeneration``` with ```predict_timestamps=True```

Open makaveli10 opened this issue 1 year ago • 2 comments

System Info

Latest Version. On google Colab

Who can help?

@sanchit-gandhi @connor-henderson

Information

I am trying to convert tensorflow whisper to tflite but turns out that TFWhisper doesnt want to output timestamp tokens.

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

import tensorflow as tf


# Importing necessary classes from transformers 
from transformers import WhisperProcessor, WhisperFeatureExtractor, TFWhisperForConditionalGeneration, WhisperTokenizer

# Importing necessary functions from datasets
from datasets import load_dataset

# Creating an instance of AutoProcessor from the pretrained model
feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-tiny.en")
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-tiny.en", predict_timestamps=True)
processor = WhisperProcessor(feature_extractor, tokenizer)

# Creating an instance of TFWhisperForConditionalGeneration from the pretrained model
model = TFWhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en")

# Loading dataset
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")

# Inputs
inputs = processor(ds[0]["audio"]["array"], return_tensors="tf")
input_features = inputs.input_features

# Generating Transcription
generated_ids = model.generate(input_features=input_features, return_timestamps=True)
transcription = processor.tokenizer.decode(generated_ids[0], decode_with_timestamps=True)
print(transcription)

<|startoftranscript|><|notimestamps|> Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.<|endoftext|>

Expected behavior

While the same tokenizer with predict_timestamps=True works as expected in pytorch:

import torch
from transformers import WhisperForConditionalGeneration

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en")

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")

inputs = processor(ds[0]["audio"]["array"], return_tensors="pt")
input_features = inputs.input_features

generated_ids = model.generate(inputs=input_features, return_timestamps=True)

transcription = processor.tokenizer.decode(generated_ids[0], decode_with_timestamps=True)
transcription

<|startoftranscript|><|0.00|> Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.<|5.44|><|endoftext|>

makaveli10 avatar Jun 01 '23 05:06 makaveli10

Will be closed by https://github.com/huggingface/transformers/pull/21334

sanchit-gandhi avatar Jun 01 '23 15:06 sanchit-gandhi

@ArthurZucker do you maybe have some time to finish https://github.com/huggingface/transformers/pull/21334? Alternatively we can open this one up to the community if not!

sanchit-gandhi avatar Jul 03 '23 15:07 sanchit-gandhi

Yep let's open it to the community I'm a bit short on time 😓

ArthurZucker avatar Jul 05 '23 02:07 ArthurZucker

If you come across this feature request and are interested in having a go, that's awesome, it's great to see! Feel free to resume the PR @ArthurZucker started at #21334 - it provides the scaffold you need to add this feature!

sanchit-gandhi avatar Jul 05 '23 17:07 sanchit-gandhi

Sure, I’ll dig it up and share tomorrow … it is only transcribing for now but with timestamps.

makaveli10 avatar Jul 06 '23 17:07 makaveli10

@nyadla-sys https://colab.research.google.com/drive/1qXcgILcA-HPEYqAYPrxQQ1TRwXerErDk?usp=sharing

makaveli10 avatar Jul 07 '23 09:07 makaveli10

Hey @nyadla-sys - cool to see that you're using the TF model for inference! Could I respectfully ask that we try and keep the GitHub issue thread relevant to the issue being discussed? For other TF / TFLite issues, you can either open a new issue or open a post on the forum: https://discuss.huggingface.co

Thanks!

sanchit-gandhi avatar Jul 07 '23 16:07 sanchit-gandhi

Hey @nyadla-sys - cool to see that you're using the TF model for inference! Could I respectfully ask that we try and keep the GitHub issue thread relevant to the issue being discussed? For other TF / TFLite issues, you can either open a new issue or open a post on the forum: https://discuss.huggingface.co

Thanks!

removed my comments ,sorry to spam

nyadla-sys avatar Jul 11 '23 19:07 nyadla-sys

May I please try to tackle this with @0525hhgus on the weekends? Thank you and I hope you all have a great weekend!

wonhyeongseo avatar Jul 14 '23 02:07 wonhyeongseo

Of course! Feel free to continue @ArthurZucker's PR https://github.com/huggingface/transformers/pull/21334 - it's already in a good state regarding TF Whisper timestamps. You just need to do the TF Whisper part, the Flax part has been merged already :)

Alternatively you can open a new PR and copy across the relevant code changes if you're more comfortable doing that. Feel free to tag myself and Arthur in any PR you work on - we're on hand to help with questions / queries!

sanchit-gandhi avatar Jul 25 '23 14:07 sanchit-gandhi

@makaveli10 Did you able to solve the word timestamp on whisper tflite , I am also on the same page,

Sriramraja05 avatar Dec 07 '23 05:12 Sriramraja05