nvda icon indicating copy to clipboard operation
nvda copied to clipboard

(Firefox) NVDA goes silent for ~30 seconds before announcing extremely long text in X/Twitter embedded timeline

Open EricDunsworth opened this issue 5 months ago • 15 comments

When using X/Twitter's embedded timeline widget, providing a data-height attribute causes a div element containing 100 tweets to become keyboard-focusable in Firefox... Focusing onto it causes NVDA to go completely silent for ~30 seconds.

I believe the aforementioned div is becoming focusable due to a bug in one of the widget's Firefox polyfills. But I don't think there's currently any viable way of bringing it to X/Twitter's attention. They laid off their a11y team in 2022 and timeline widget development appears to have stalled (still refers to "Twitter" to this day).

In any case, I wanted to report NVDA's "silence" issue.

PS: The underlying issue doesn't appear to be specific to Firefox. If I manually-add tabindex="0" to the problematic div in Chromium browsers, then try focusing onto it, NVDA will still go silent for quite a while.

Steps to reproduce:

  1. Open Firefox
  2. Visit this JS Bin sample
  3. Wait for the X/Twitter timeline widget to finish loading (~25 seconds)
  4. Tab to the "Before widget" link ("Before widget link")
  5. Tab to the iframe element (no announcement)
  6. Tab to the link bar ("Twitter Timeline frame clickable Tweets from @ XDevelopers link Follow on Twitter link")
  7. Tab to the link bar's first nested link ("Tweets from @ XDevelopers link link")
  8. Tab to the link bar's second nested link ("Follow on Twitter visited link link")
  9. Tab to the tweets container div

Actual behavior:

NVDA goes completely silent for ~30 seconds, then announces "Timeline region clickable Developers verified account..." (covers 100 tweets, extremely long). Tabbing to anything else during the period of silence changes what ultimately gets announced, but doesn't shorten the wait.

Speech viewer quickly logs the text in question right before the period of silence.

Expected behavior:

NVDA quickly begins announcing "Timeline region clickable Developers verified account..." (covers 100 tweets, extremely long).

NVDA logs, crash dumps and other attachments:

Gist: Speech viewer log (starting from step 4)

System configuration

NVDA installed/portable/running from source:

Installed

NVDA version:

2023.3.3 (2023.3.3.30854)

Windows version:

10.0.19045 Build 19045 (22H2)

Name and version of other software in use when reproducing the issue:

  • Firefox version: 123.0
  • Firefox Developer Edition version: 124.0b4

Other information about your system:

N/A

Other questions

Does the issue still occur after restarting your computer?

Yes

Have you tried any other versions of NVDA? If so, please report their behaviors.

NVDA 2023.1 had similar behaviour (updated to 2023.3.3 to see if anything changed)

If NVDA add-ons are disabled, is your problem still occurring?

Yes (using factory default configuration - doesn't come with any add-ons)

Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?

Haven't tried

EricDunsworth avatar Feb 27 '24 23:02 EricDunsworth

Please report this to Twitter/X. Even though it is likely they will be unresponsive, there's likely very little we can do in this situation

seanbudd avatar Mar 05 '24 01:03 seanbudd

@seanbudd But regardless of X/Twitter's buggy focusing behaviour, wouldn't the very long period of silence ultimately be an NVDA issue? That's why I felt the need to report it here.

My intent wasn't to try getting NVDA to implement a workaround for X/Twitter's bug, but to be able to better adapt to this kind of "worst-case" scenario (i.e. focusing onto extremely long text).

I presume the silence might be due to NVDA taking a while to process the long text prior to announcing anything. So I was hoping that this issue could potentially lead to performance optimizations. Or perhaps quicker responsiveness if a user tries changing focus during the period of silence.

For instance, as things stand right now, if a user tries focusing away from the extremely long text during the period of silence towards much shorter text, NVDA will correctly announce the shorter text... eventually. That may indicate that NVDA is still trying to process the long text after it's become irrelevant. So maybe something could be done to make NVDA immediately drop what it's doing to avoid the silence in that scenario?

Hope this helps clarify what I had in mind 😃!

EricDunsworth avatar Mar 05 '24 16:03 EricDunsworth

CC @michaelDCurran @jcsteh

It does seem concerning that this is even possible. A web site shouldn't be able to create this behavior in NVDA, IMO. Or at the very least, such processing should cancel with the speech.

XLTechie avatar Mar 05 '24 20:03 XLTechie

I'm not questioning the validity of this bug, but for diagnostic purposes, what speech synthesiser are you using? Have you tested this with eSpeak? Because OneCore can't stream speech - i.e. it processes the entire chunk of speech in a single hit - I'm wondering whether that might be at play here if you're using OneCore. An NVDA log (from NVDA menu -> Tools -> View log) would also be useful.

jcsteh avatar Mar 05 '24 21:03 jcsteh

Looks like I'm using OneCore (default speech synthesizer when I set factory defaults).

Just tried switching to eSpeak NG. The period of silence was dramatically reduced from ~30 to ~3 seconds.

Here's a log of me starting with factory defaults (OneCore), following steps 4-9, switching to eSpeak NG and retrying the steps again. Only allowed NVDA to announce the start of the extremely long text both times.

EricDunsworth avatar Mar 05 '24 23:03 EricDunsworth

Obviously, even a 3 second freeze isn't great, but I think the egregious part of this is the 27 second freeze caused by OneCore. I've thought for a while that our OneCore driver (since Microsoft clearly can't be bothered to fix this themselves) should probably split up input to prevent this kind of problem, as well as improve responsiveness generally. The question is: where do we split? Splitting in the wrong places will result in annoying, unnatural breaks in speech for users. If we don't split enough, we get performance problems. As always, the devil is in the details.

jcsteh avatar Mar 05 '24 23:03 jcsteh

But also, when doing speakTextInfo with OutputReason.FOCUS, NVDA should probably truncate the output if it is larger than a certain size.

jcsteh avatar Mar 05 '24 23:03 jcsteh

@jcsteh It seems obvious to say this, but perhaps you split somewhere that there is going to be a pause anyway?

So, after some configured length, you start looking for a period or comma, preferably a period, and split there.

I.e. after 5,000 chars or whatever, you check the next 1,000 for a period. If there is one, you split there. If there isn't one, walk down the punctuation pause list (e.g. comma, semicolon, etc.), and split on one of those. If you still don't find one, just split on the first whitespace after character 5,000.

I think that people would rather the small delay this will cause, rather than the huge delay involved in this bug.

OneCore is already hellaciously slow and pause prone, I doubt it will matter.

XLTechie avatar Mar 06 '24 05:03 XLTechie

Sure, it "seems obvious", but as I said, the devil is in the details. What you've just proposed is great if you speak English and a few other languages. But what about languages with entirely different character sets? What are the rules for pausing there?

jcsteh avatar Mar 06 '24 06:03 jcsteh

I was generalizing to English, but would not the concept hold true in other supported languages?

I have researched this slightly, and punctuation appears to be a global feature in all modern languages. It remains optional in some and can not be assumed to exist, but the chances of a multi-thousand character string of human readable text existing without at least some kind of pause-granting punctuation, seem rather low.

Or am I missing a more subtle aspect?

XLTechie avatar Mar 06 '24 06:03 XLTechie

I don't think there's a more subtle aspect. I just don't know of a documented, evidence-based list of "some kind of pause-granting punctuation", so I'm reluctant to make assumptions. But I suppose we could start with a small list and work upwards.

Note that say all already has some pause detection code, but it's somewhat limited (English-like full stops only) and it's regexp based. It's less risky there because it's used to avoid pauses, not add more pauses.

jcsteh avatar Mar 06 '24 07:03 jcsteh

Once again that's a situation where we need to divide text in sentences, or even smallest chunks.

In addition to the sayAll case that I did not know, there have been more recently other implementations of such division algorithm:

  • in documentNavigation\sentenceHelper.py used for the paragraph navigation feature
  • "Regular expression for text paragraph navigation" in Advanced Settings, used in Text paragraph navigation feature (#16031).

Note that Arabic punctuation signs doe not seem to appear in these regexp; I think that they also often use English punctuation, but for completeness, Arabic ones should also be included IMO.

I wish both these algorithmes were factorized as explained in https://github.com/nvaccess/nvda/pull/16031#issuecomment-1932388384.

More generally, I really think that we would need a common general algorithm to detect sentence boundaries that may be used for:

  • say all
  • text paragraph navigation
  • paragraph navigation
  • future sentence navigation feature (#8518)
  • splitting speech text where needed, e.g. in OneCore driver to solve the current issue.
  • and probably useful elsewhere

CyrilleB79 avatar Mar 06 '24 08:03 CyrilleB79

I'd caution that while abstraction, encapsulation and code reuse are great and should be done where possible, there are factors other than these to consider. For example, the performance characteristics in all of these situations are likely slightly different; the amount of text that needs to be handled in one shot, whether we want all breaks or just the first/last, etc. Also, there are other constraints involved. For example, for TextInfos, the text you have might not include the start or end of the sentence and you might have to walk to find the next boundary, whereas most algorithms (such as those in icu) expect a complete block of text. You might also not be able to use offsets for some TextInfos, which means you can't reliably use an algorithm that processes anything larger than a word at a time.

jcsteh avatar Mar 06 '24 08:03 jcsteh

Distilled test case in https://github.com/nvaccess/nvda/issues/16307#issuecomment-1999122043.

jcsteh avatar Mar 15 '24 22:03 jcsteh

In some cases there might be other factors causing a freeze, for example speech dictionary substitution processing or character processing with a large string and a regexp with a lot of backtrack.

thgcode avatar Apr 30 '24 23:04 thgcode