lute-v3 icon indicating copy to clipboard operation
lute-v3 copied to clipboard

Somehow directly link pages or even paragraphs to audio timestamps

Open jzohrab opened this issue 1 year ago • 2 comments

I'm learning Vietnamese, and sometimes only want to hear a particular thing from my audio, e.g.

Image

The audio for this page is 1:44, sometimes I'm in a hurry and want to go to just that clip. I have no idea how to design the UX for this. :-)

jzohrab avatar Mar 20 '24 03:03 jzohrab

Hi!

Does the audio have timestamps, or do you want to auto create timestamps for it, or do you want to manually timestamp it?

I've had an idea that would be great for language learning, but I do not have the hardware and time to do it. I think it is possible to auto timestamp a text and audio if you have both this way:

  1. Generate a text from the audio with a speech-to-text AI tool like whisper (it has a fork that timestamps words).
  2. Compare the speech-to-text result with the original text, and you can add timestamps to it where it matches.

I think there is already a tool similar for movies and subtitles called SubSync. It would be great to be able to use a book and audiobook, and jump to words or sentences easily.

Oh, I think someone already made it: SubPlease

simjanos-dev avatar Mar 20 '24 22:03 simjanos-dev

Hi @simjanos-dev (LinguaCafe!!! :tada: :wave:),

This may be a non-trivial item, for a few reasons. I think users should be able to add timestamps, b/c sometimes I want to add a bunch of timestamps for my own listening enjoyment :-P

Misc rambling thoughts ...

Some notes about data storage

no timestamps in the input text

Lute saves its texts as plain text, and only parses and find terms at render time. There aren't any timestamps in the file, and the Lute parser doesn't know how to handle those. e.g the above page is only stored as:

Học viên không nhìn. Giáo viên đọc cho học viên viết các từ và câu dưới đây.

1. Con chào ba ạ.
2. Mẹ ơi!
3. Anh trai tôi tên là Bắp.
4. Còn chị gái em? Chị ấy tên là gì?

For various reasons, when a page is rendered, the sentences for that page are also stored in the db, e.g.:

sqlite> select * from sentences where setxid = 4877 limit 30;
279117|4877|1|​Học​ ​viên​ ​không​ ​nhìn​.​
279118|4877|2|​Giáo​ ​viên​ ​đọc​ ​cho​ ​học​ ​viên​ ​viết​ ​các​ ​từ​ ​và​ ​câu​ ​dưới​ ​đây​.​
...
279121|4877|5|​1.​
279122|4877|6|​Con​ ​chào​ ​ba​ ​ạ​.​
...
279124|4877|8|​2.​
279125|4877|9|​Mẹ​ ​ơi​!​
...

audio files

The audio files are plain mp3 files, and can get quite large. The mp3 doesn't have timestamps either.

Keeping things in sync

  • The text page can change -- users can edit text files when they're imported.
  • should timestamps be stored in the text itself? That implies changing the stored page text while adding timestamps

jzohrab avatar Mar 21 '24 02:03 jzohrab