lute-v3
lute-v3 copied to clipboard
Somehow directly link pages or even paragraphs to audio timestamps
I'm learning Vietnamese, and sometimes only want to hear a particular thing from my audio, e.g.
The audio for this page is 1:44, sometimes I'm in a hurry and want to go to just that clip. I have no idea how to design the UX for this. :-)
Hi!
Does the audio have timestamps, or do you want to auto create timestamps for it, or do you want to manually timestamp it?
I've had an idea that would be great for language learning, but I do not have the hardware and time to do it. I think it is possible to auto timestamp a text and audio if you have both this way:
- Generate a text from the audio with a speech-to-text AI tool like whisper (it has a fork that timestamps words).
- Compare the speech-to-text result with the original text, and you can add timestamps to it where it matches.
I think there is already a tool similar for movies and subtitles called SubSync. It would be great to be able to use a book and audiobook, and jump to words or sentences easily.
Oh, I think someone already made it: SubPlease
Hi @simjanos-dev (LinguaCafe!!! :tada: :wave:),
This may be a non-trivial item, for a few reasons. I think users should be able to add timestamps, b/c sometimes I want to add a bunch of timestamps for my own listening enjoyment :-P
Misc rambling thoughts ...
Some notes about data storage
no timestamps in the input text
Lute saves its texts as plain text, and only parses and find terms at render time. There aren't any timestamps in the file, and the Lute parser doesn't know how to handle those. e.g the above page is only stored as:
Học viên không nhìn. Giáo viên đọc cho học viên viết các từ và câu dưới đây.
1. Con chào ba ạ.
2. Mẹ ơi!
3. Anh trai tôi tên là Bắp.
4. Còn chị gái em? Chị ấy tên là gì?
For various reasons, when a page is rendered, the sentences for that page are also stored in the db, e.g.:
sqlite> select * from sentences where setxid = 4877 limit 30;
279117|4877|1|Học viên không nhìn.
279118|4877|2|Giáo viên đọc cho học viên viết các từ và câu dưới đây.
...
279121|4877|5|1.
279122|4877|6|Con chào ba ạ.
...
279124|4877|8|2.
279125|4877|9|Mẹ ơi!
...
audio files
The audio files are plain mp3 files, and can get quite large. The mp3 doesn't have timestamps either.
Keeping things in sync
- The text page can change -- users can edit text files when they're imported.
- should timestamps be stored in the text itself? That implies changing the stored page text while adding timestamps