fromthepage feature: audio transcription support

Support for audio transcriptions.

Please add notes here for customers requesting this feature so we can gauge interest.

Apr 08 '16 13:04 saracarl

Audio transcription requires the ability to play audio. That's pretty hard, though there is some experimental work done by Wikisource and Scripto.

However, IIIF is starting work on "time-based media", which includes audio. My suggestion is that we wait a year or two, watch that effort, and then use it.

Apr 08 '16 13:04 benwbrum

We've had some more conversations and learned a lot more about A/V since this issue was closed. Reopening to capture design ideas:

Draft Work-flow & Questions

Import

Project owner imports audio file by reference from an authorized source
- Q: Which sources are allowed? We need to be able to embed them within a player we can control
System dispatches the audio to an AI-based transcription service to return timestamps, raw text, and (possibly) speakers.
- Q: Which AI service should we use? Do any return speakers or speaker transitions?
System waits for the AI response
On receiving the AI response, the system segments the audio file into "pages", which correspond to phrase boundaries/pauses.
System creates a new work from the audio and the response. The work contains standard pages, each with a reference to the audio region being transcribed. Each page text contains the timestamps/speaker/text from the AI transcript. The audio file is now ready for human transcription.

Transcription

Users are presented with a page at a time of audio to transcribe, based on segmentation done during the import. The transcription screen will contain the audio player in place of the page image, and a set of timestamp/speaker/text fields corresponding to the AI response for this page
Clicking on a timestamp or editing the text associated with the timestamp will play the audio at that point.
Users will be able to change speaker (with autocomplete/dropdown)
Users may not be able to edit timestamp
Users will be able to transcribe afresh or possibly edit text transcript
Saving the page will update the status of the page and work.

Open Questions

Aug 29 '22 17:08 benwbrum

fromthepage fromthepage copied to clipboard

feature: audio transcription support

Draft Work-flow & Questions

Import

Transcription

Open Questions

fromthepage
fromthepage copied to clipboard