fromthepage
fromthepage copied to clipboard
feature: audio transcription support
Support for audio transcriptions.
Please add notes here for customers requesting this feature so we can gauge interest.
Audio transcription requires the ability to play audio. That's pretty hard, though there is some experimental work done by Wikisource and Scripto.
However, IIIF is starting work on "time-based media", which includes audio. My suggestion is that we wait a year or two, watch that effort, and then use it.
We've had some more conversations and learned a lot more about A/V since this issue was closed. Reopening to capture design ideas:
Draft Work-flow & Questions
Import
- Project owner imports audio file by reference from an authorized source
- Q: Which sources are allowed? We need to be able to embed them within a player we can control
- System dispatches the audio to an AI-based transcription service to return timestamps, raw text, and (possibly) speakers.
- Q: Which AI service should we use? Do any return speakers or speaker transitions?
- System waits for the AI response
- On receiving the AI response, the system segments the audio file into "pages", which correspond to phrase boundaries/pauses.
- System creates a new work from the audio and the response. The work contains standard pages, each with a reference to the audio region being transcribed. Each page text contains the timestamps/speaker/text from the AI transcript. The audio file is now ready for human transcription.
Transcription
- Users are presented with a page at a time of audio to transcribe, based on segmentation done during the import. The transcription screen will contain the audio player in place of the page image, and a set of timestamp/speaker/text fields corresponding to the AI response for this page
- Clicking on a timestamp or editing the text associated with the timestamp will play the audio at that point.
- Users will be able to change speaker (with autocomplete/dropdown)
- Users may not be able to edit timestamp
- Users will be able to transcribe afresh or possibly edit text transcript
- Saving the page will update the status of the page and work.