WhisperKit
WhisperKit copied to clipboard
Is it possible to add a TranscriptionSegment callback?
during file transcribe, it is more convenient if we can get callback of a TranscriptionSegment, the TranscriptionProgress is not that helpful.
It would be very useful to get TranscriptionSegment indeed, with the word timestamps when available. Currently the TranscriptionProgress text contains raw strings such as <|startoftranscript|><|pl|><|transcribe|><|0.00|> Jeżeli zastanawiajcie się which isn't easy to parse.
We're currently not building the segments before a window completes, but it may be possible to have it return when we see two timestamp tokens surrounding text come through. Would you prefer a separate callback for this, or a configurable parameter on the existing callback eg. callbackInterval: .token or callbackInterval: .segment?
What's the relationship of the TranscriptionProgress callback and the TranscriptionSegment callback?
Will they callback at the same time? If so, may be merged into one. If not, may be separated.
I would prefer a separate callback that returns TranscriptionSegment structs.
I believe this is comeplete now with https://github.com/argmaxinc/WhisperKit/pull/240