WhisperKit icon indicating copy to clipboard operation
WhisperKit copied to clipboard

Is it possible to add a TranscriptionSegment callback?

Open Josscii opened this issue 1 year ago • 4 comments

during file transcribe, it is more convenient if we can get callback of a TranscriptionSegment, the TranscriptionProgress is not that helpful.

Josscii avatar Jun 05 '24 02:06 Josscii

It would be very useful to get TranscriptionSegment indeed, with the word timestamps when available. Currently the TranscriptionProgress text contains raw strings such as <|startoftranscript|><|pl|><|transcribe|><|0.00|> Jeżeli zastanawiajcie się which isn't easy to parse.

ldenoue avatar Jun 05 '24 15:06 ldenoue

We're currently not building the segments before a window completes, but it may be possible to have it return when we see two timestamp tokens surrounding text come through. Would you prefer a separate callback for this, or a configurable parameter on the existing callback eg. callbackInterval: .token or callbackInterval: .segment?

ZachNagengast avatar Jun 05 '24 19:06 ZachNagengast

What's the relationship of the TranscriptionProgress callback and the TranscriptionSegment callback?

Will they callback at the same time? If so, may be merged into one. If not, may be separated.

Josscii avatar Jun 06 '24 00:06 Josscii

I would prefer a separate callback that returns TranscriptionSegment structs.

ldenoue avatar Jun 06 '24 14:06 ldenoue

I believe this is comeplete now with https://github.com/argmaxinc/WhisperKit/pull/240

ZachNagengast avatar Apr 18 '25 23:04 ZachNagengast