android-app icon indicating copy to clipboard operation
android-app copied to clipboard

After interruption, TTS restarts at wrong place

Open ntnsndr opened this issue 4 years ago • 9 comments

Issue details

Duplicate?

Have you searched the issues of this repository if your issue is already known? yes/no

Yes, I haven't found it. I may have missed it—if so, apologies.

Actual behaviour

When I'm listening to TTS (my main way of using the app), and there is some kind of interruption (such as a text message ping or a voice instruction from a navigation app), TTS re-starts after the interruption not where it left off but where I initially started it (usually the beginning of the article). This makes the app largely unusable, for instance, while running a navigation app and driving.

Expected behaviour

TTS playback should continue after the interruption where it left off.

Steps to reproduce the issue

  1. Begin TTS playback
  2. Receive instruction from an app like Google Maps or receive a text message
  3. TTS re-starts where it first began

Environment details

  • wallabag app version: 2.3.0
  • wallabag app installation source (e.g. Gplay, F-Droid, manual): F-Droid
  • Android OS version: 9
  • Android ROM (e.g. stock, LineageOS, SlimRom,…): LineageOS
  • Android hardware: Moto X
  • wallabag server version: 2.4.0-dev

Your experience with wallabag Android app

Have you had any luck using wallabag Android app before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)

I love the app and use it daily. It is mostly an excellent experience. Thank you so much for all you do!

ntnsndr avatar Mar 29 '20 04:03 ntnsndr

Maybe @tyndare has an idea how to fix this?

Strubbl avatar Mar 29 '20 20:03 Strubbl

I've been doing extensive refactoring of TTS-related stuff recently, so I advise against doing anything non-trivial with TTS at the moment. I may look into the issue myself later.

di72nn avatar Mar 29 '20 21:03 di72nn

There seems to be no straightforward solution to this.

Here are some details on how it works now for anyone interested.

  1. The app parses article content (HTML) looking for text nodes. Usually those are paragraphs. So a paragraph often is a unit of speech that is produced.
  2. Then the app sends those units of speech to some TTS engine to synthesize (the TTS engine sends the sound to a sound device by itself). There's no way to pause it, only to stop currently produced sound. So pause/resume is not available.
  3. If the app needs to pause, it stops currently produced unit and restarts it from the beginning when it needs to resume. As a result it pretty much always "rewinds".

Instead of having a TTS engine play produced sound it is possible to have it produce an audio file and play it by the app. That would allow to pause/resume as necessary. But that may cause some delays, because it is possible to play audio only when it is fully synthesized (with current approach TTS engines can probably play audio as they produce it). If a TTS engine uses server-size synthesis the delays may be even worse (due to network delays). The delays may also depend on text unit sizes.

Another option is to reduce the size of text units (to a single sentence, for example) so the "rewind" is not so severe. I think that that is generally a good idea (it would allow for short rewind and fast forward).

Apparently, it is also possible to track how much text is already spoken in a speech unit, so the app can try to cut already spoken text when it tries to resume. I don't know how precise this is and whether it is supported by all TTS engines.

Feel free to share any ideas/opinions.

di72nn avatar Mar 31 '20 12:03 di72nn

Thanks for this. I think the idea of reducing the size of text units, such as to a sentence or a paragraph, would be a serious improvement. It would make the difference between a minor inconvenience and unusability.

ntnsndr avatar Mar 31 '20 15:03 ntnsndr

Reducing the text unit by cutting them into sentences was on my wish list. I never took time to really work on it, I was wondering if there is a simple way to do it whatever the language.

Currently the text unit depends on the HTML structure of the page so in theory it should not be bigger than paragraph.

When doing resume, there is a feature that made sense for me for manual stop/resume, but that is probably a bug for a resume after external interruption: resume will check that the current text unit is visible on the screen, if it is not the case it will resume at the first text unit visible on the screen.

The TTS support in Wallabag app automatically scroll the screen when playing the next text unit so usually there is no problem. I was wondering though if there could be an issue on some device if Wallabag app is executed in the background.

I know this is problematic if automatic switch to the next article is activated and Wallabag app is executed in the background: the HTML of the new article will not really be rendered by Android if the app is in the background, and then the position Android give for the HTML text may be wrong (not all phone react the same). If the position is wrong the check that the current text unit is visible on the screen may fail and it may resume at the wrong place like the beginning of the article. To avoid this I forced the Wallabag app to go foreground when switching to the next article so that HTML is rendered correctly but this is not effective if the screen is locked.

@di72nn you are right: an extensive refactoring is probably necessary... I didn't know it was possible to track how much text was spoken by TTS, I never tried it. I like your idea of rendering an audio file and playing it. It would give a perfect pause/resume result. Maybe doing it by text unit rather than the full article may allow to keep synchronization with page scrolling.

tyndare avatar Mar 31 '20 21:03 tyndare

I didn't know it was possible to track how much text was spoken by TTS, I never tried it.

There are some new methods available in UtteranceProgressListener which was added in SDK 15.

I just noticed that there is an onAudioAvailable(...) method, the description of which suggest that the audio synthesized to a file may be available in chunks. But I'm not sure I want to deal with chunks - should try whole files first.

di72nn avatar Apr 01 '20 14:04 di72nn

One other observation here—and let me know if this requires a separate Issue: I remembered a persistent problem where, if I press the pause button in the notification drawer in Android (9), and then press play again, it starts at the start of the playback. In contrast, the play button within the app itself is much better at re-starting where I was. I imagine this is the same problem in another guise.

ntnsndr avatar Apr 03 '20 02:04 ntnsndr

@ntnsndr that's an important observation for your problem. The notification play/pause buttons and the buttons within the app do exactly the same thing. The difference is whether the app is in foreground (and/or the screen is on). That is the problem @tyndare described earlier.

To reiterate, the problem is that when you press "play" (regardless where) the app checks that the text it is going to read is visible on the screen. If it's not, the TTS moves to the first text item visible on the screen. And since Android WebView (which we use to display article content) may not update itself (it may always stay at the beginning of the article, for example) if it's not in the foreground, the pause-play commands effectively reset TTS reading to the start.

One approach to deal with this is to reset to the visible text only when you press "play" from the article view (not from notification, not from auto pause on interruptions). The downside is that it will not change position if you scroll the article and press play not from article view.

Another idea is to not reset position only if the pause was called for audio interruption. So everything stays as it is, except that only interruption handling ignores screen position. But that doesn't fix the problem with pause-play from notification (if the app in the background and WebView is not updating).

The third idea is to save article scroll position in "pause" and check it in "play": if it wasn't changed, don't reset anything.

That last option seems to be the most reasonable of the three. Opinions?

di72nn avatar Apr 04 '20 10:04 di72nn

I just wanted to give this thread a bump, and link it to #429.

I feel like a lot of the TTS issues are all related to the same core issue - if Wallabag TTS could keep track of the sentence number (e.g. 13 of 45), it could be used to tell webview to update correctly when focus returns, and could be used to hilight the sentence.

nosignal101 avatar Oct 09 '22 13:10 nosignal101