dsnote Feature Request: Add timestamps to speech to text from audio file

Hi, I would like it if you can add a timestamp feature to the speech to text feature when using audio files. I regularly download audio from various websites and transcribe them to text using TurboScribe. I then later use LLM to summarize the audio and go through it quickly. I would like it if similar feature is implemented in your app so that I can switch my workflow to my local machine. Also having automatic detection of different speakers would be nice if possible. Thanks.

Feb 24 '25 04:02 mediocretwo

Hi. Thanks for your suggestions.

I would like it if you can add a timestamp feature to the speech to text feature when using audio files.

Could you elaborate on this more? What kind of timestamps? Currently, you can generate subtitles that contain timestamps. My guess is that you are looking for something different.

Also having automatic detection of different speakers would be nice if possible.

This is possible, but complicated because a decent model of speaker diarization is not completely free to download. You can read more about this problem here: here: https://github.com/mkiol/dsnote/issues/84.

Mar 01 '25 15:03 mkiol

Currently, you can generate subtitles that contain timestamps

I didn't realise this. I think I can work with this, I want to switch away from using online services like TurboScribe. They provide timestamps as a simple number (like 00:10) before every few sentences. SRT subtitles are good feature.

This is possible, but complicated because a decent model of speaker diarization is not completely free to download.

I see, can we maybe provide the user with the option to provide the contact info if desired? Let the users who want the diarization and are comfortable with providing contact info be able to do so and use the diarization feature. Those who do not want to do it wont be affected.

Mar 03 '25 14:03 mediocretwo

They provide timestamps as a simple number (like 00:10) before every few sentences.

SRT may be too verbose for some use cases. "A simple number before each sentence" can be relatively easy to implement. Let me know if you really need it :)

Let the users who want the diarization and are comfortable with providing contact info be able to do so and use the diarization feature.

Yes and no. Speech Note is promoted as a privacy-focused app, and you're right, the user always decides, but I don't want to encourage anyone to engage in risky behavior. This model is free, but for some reason you have to identify yourself with a huggingface account. I don't know why someone made this decision. Why someone wants to know who is downloading this model. This is not transparent to me.

Mar 05 '25 17:03 mkiol

I see. Alright, no worries.

Let me know if you really need it :)

I would appreciate a simpler timestamp! Attaching the turboscribe transcript to show how they do it. Maybe something like that. Thanks :)

Mar 06 '25 14:03 mediocretwo