fromthepage icon indicating copy to clipboard operation
fromthepage copied to clipboard

feature: audio transcription support

Open saracarl opened this issue 8 years ago • 2 comments

Support for audio transcriptions.

Please add notes here for customers requesting this feature so we can gauge interest.

saracarl avatar Apr 08 '16 13:04 saracarl

Audio transcription requires the ability to play audio. That's pretty hard, though there is some experimental work done by Wikisource and Scripto.

However, IIIF is starting work on "time-based media", which includes audio. My suggestion is that we wait a year or two, watch that effort, and then use it.

benwbrum avatar Apr 08 '16 13:04 benwbrum

We've had some more conversations and learned a lot more about A/V since this issue was closed. Reopening to capture design ideas:

Draft Work-flow & Questions

Import

  • Project owner imports audio file by reference from an authorized source
    • Q: Which sources are allowed? We need to be able to embed them within a player we can control
  • System dispatches the audio to an AI-based transcription service to return timestamps, raw text, and (possibly) speakers.
    • Q: Which AI service should we use? Do any return speakers or speaker transitions?
  • System waits for the AI response
  • On receiving the AI response, the system segments the audio file into "pages", which correspond to phrase boundaries/pauses.
  • System creates a new work from the audio and the response. The work contains standard pages, each with a reference to the audio region being transcribed. Each page text contains the timestamps/speaker/text from the AI transcript. The audio file is now ready for human transcription.

Transcription

  • Users are presented with a page at a time of audio to transcribe, based on segmentation done during the import. The transcription screen will contain the audio player in place of the page image, and a set of timestamp/speaker/text fields corresponding to the AI response for this page
  • Clicking on a timestamp or editing the text associated with the timestamp will play the audio at that point.
  • Users will be able to change speaker (with autocomplete/dropdown)
  • Users may not be able to edit timestamp
  • Users will be able to transcribe afresh or possibly edit text transcript
  • Saving the page will update the status of the page and work.

Open Questions

benwbrum avatar Aug 29 '22 17:08 benwbrum