audiobookshelf
audiobookshelf copied to clipboard
[Enhancement]: Adding Transcription/Subtitle Viewing Support
Describe the feature/enhancement
Transcription/Subtitle support
Summary
Add initial support for transcriptions. Apple now supports transcriptions in podcasts.
Some audiobook files have transcriptions, and currently, we can use tools based on Whisper to transcribe audio to text.
In fact, most software based on Whisper support transcribing audio to text and exporting it as an SRT or VTT file. VTT is a native format for the web, and SRT is a common format for subtitles.
I'm creating this issue to discuss the best way to implement transcription support on the web player. I'm trying to implement some features on the pull request #
Podcast transcription is supported by:
- [ ] https://podcasting2.org/podcast-namespace/tags/transcript
Possible tasks:
- [ ] Implement support for parsing and displaying VTT files. WebVTT #2918
- [ ] Implement support for parsing and displaying SRT files (convert to vtt or try to parse directly?).
- [ ] Support
<podcast:transcript>
tag in RSS feed. Apple Docs / Podcast Namespace - [ ] Handle CORS issues when fetching transcription files from external sources. Podcasting 2.0 CORS
- [ ] Implement logic to read transcription from a file with the same name as the audio file. #2918
- [ ] Define priority if both VTT and SRT files exist with the same name?
- [ ] When a podcast has a transcription tag, automatically download the transcription and store it in the file system (offline mode).
- [ ] Add a visual indication in the UI that a podcast/audiobook has a transcription available?
- [ ] Implement a feature to search within the transcription (better if using a lateral panel)?
- [ ] Implement a feature to highlight the current line of the transcription as the podcast/audiobook plays. #2918
- [ ] Implement a feature to navigate to a specific part of the podcast/audiobook by clicking on the transcription text. Seek to the corresponding time in the audio. #2918
- [ ] Implement a feature to toggle the display of the transcription on/off on the web player. #2918
- [ ] Implement a button to download the transcription file? Can be useful for editing or sharing. #2918
- [ ] Allows to upload a VTT file for transcription?
Note: I think we need define a standard for multi-language transcriptions. For example use some prefix in the file name like
en-
for English andes-
for Spanish.
UI Ideas on the Web Player:
What's the best way to display the transcription on the web player?
- [ ] Above/Below audio player controls. #2918
- Good: because the audio player is omnipresent on all pages and the transcription can be displayed in a fixed position.
- Bad: Not enough space to display multiple lines of text.
- [ ] Modal.
- Good: More space to display multiple lines of text. Can float over the UI.
- Bad: The modal can be intrusive.
- [ ] Lateral Panel. (like the iTunes/Apple Music).
- Good: More space to display multiple lines of text. Can have a search feature. Better for implement the seek feature (click on the line and seek to the corresponding time).
- Bad: Take up space on the screen. Not good for small screens.
Related
- #1723
IMHO, when it comes to the UI you should combine both, the big panel for browsing and the panel below controls perhaps just with the current line but bigger. You have to take into account accessibility, some folks will want it to be resizable.