nvda
nvda copied to clipboard
NVDA isn't ignoring soft hyphens properly
Steps to reproduce:
- Save and open the following HTML:
<!DOCTYPE html> <html lang="en"> <head> <title>Hyphenation Test</title> </head> <body> <h1>Hooray for Hyphen­ation</h1> <p>This text contains some Hyphen­ation. Hope­fully it is not in­com­pre­hen­sible. Usually NVDA can process soft hyphens in documents pretty well, but did you notice the previous word "incomprehensible"?</p> <p>Pay attention to the word "pronunciation", which is pronunced differently with soft hyphens in it:</p> <p><strong>With: </strong> "pro­nun­cia­tion"</p> <p><strong>Without:</strong> "pronunciation"</p> <h2>Pro­nun­cia­tion is the key to under­standing the spoken word</h2> <p><strong>Did you know?</strong> Soft hyphens have been around since the 80s!</p> <p>The next heading does not contain hyphens.</p> <h2>Pronunciation is the key to understanding the spoken word</h2> <p>It's a pitty hyphenation cannot be reliably applied via CSS.</p> </body> </html>
- Read the document with NVDA.
- Access NVDA's Element Browser via the NVDA-key + F7.
- In the Element Browser, switch to headings and let NVDA read the entries to you. (Soft hyphens are shown in the display.)
Actual behavior:
Soft hyphens are splitting words and causing odd pronunciations.
Expected behavior:
Soft hyphens are ignored.
System configuration
NVDA installed/portable/running from source:
installed
NVDA version:
2018.4.1
Windows version:
Win 7 64 bit
Name and version of other software in use when reproducing the issue:
Firefox 65.0.2
Other information about your system:
Default language is German (but that shouldn't mattern, should it?)
Other questions
Does the issue still occur after restarting your PC?
yes
Have you tried any other versions of NVDA?
no
Log
@Michael-Detmers: Thank you very much for opening this issue, because I planned exactly the same.
But I would suggest that the user should still have the option to enable and disable filtering the soft hyphen (U+00AD) via the Browse Mode NVDA Settings. As a web developer you should have the opportunity to check the correct position of soft hyphens in all web browsers. But normally there isn't any useful benefit for screen reader users regarding this character. And sadly based on the responsive web design this character is often more used. And reading a news article, which contains "hundreds" of them, via speech and/or braille is extremely annoying.
CC: @michaelDCurran, @jcsteh and @MarcoZehe
One more question: What shall we do with the new HTML5 tag <wbr>
? Any thoughts?
@DrSooom One more question: What shall we do with the new HTML5 tag
<wbr>
? Any thoughts?
Since its purpose is to affect how a line of text is displayed, I'd vote for it to be generally ignored as well. And, as you suggest, there would have to be an option to read all punctuation and special characters verbatim for development and quality assurance purposes.
A practical example where this is a huge problem is the CTAN repository for latex packages - for example, the page for the Amsmath package
Sadly, no change. On Windows 10 (enterprise 64 bit), Firefox 79, NVDA 2020.2 still chops up words containing soft hyphens. (And neither CSS nor Browsers have provided us with a reliable, universal alternative to soft hyphens yet.)
For the most part, hyphenation is unfortunately needed to meet the WCAG reflow requirements. Without it, long words will simply either flow out of visible areas, overlap each other or - ironically - will also be visually chopped up without any sign of continuation, since the hyphens are missing.
So the current state is this: Either we make it hard to understand for our blind visitors or for our seeing ones. And because I cannot find a suitable WCAG requirement for, so to speak, "avoiding hyphenation", with a heavy heart I still must recomment sticking to the thirty year old unicode control character. I hope widespread support for these typography tools will be available soon, so all users can have a great experience.
Switching the soft hyphen to be passed to the synthesiser (in Punctuation/symbol pronunciation...) fixes the issue, at least with eSpeak. I'll look into how it goes with other synths and see if I can change that to be the default and make a PR.
Shouldn't have been so hasty. While eSpeak handles soft hyphens correctly, none of the other synths I have installed (SAPI5, One Core, Eloquence and Vocalizer) do. While handling them correctly probably should be up to the synthesiser, just switching them to be passed directly to the synth is not a very satisfactory solution. I'm not sure that having a setting to strip them is particularly satisfactory either. As a temporary work around, a speech dictionary entry that replaces them with the empty string seems to work (suggested by Ralf Kefferpuetz on the mailing list).
See also: https://github.com/nvaccess/nvda/issues/10634#issuecomment-566758080
I guess to fix this properly, we need an additional behavior in the speech symbol processor that simply discards the symbol as it wasn't there.
@leonardder: Please don't overlook the braille output, as ⠁⠏⠏⠇⠊⠉⠁⠞⠊⠕⠝ is also easier to read instead of ⠁⠏⠏⢤⠇⠊⢤⠉⠁⢤⠞⠊⠕⠝ (⢤ = SHY in German 8-dot), but both are needed depending on the situation (e.g. dictionaries, word processing, web/app development). I already pointed this out in my above linked comment.
I think handling soft hyphens primarily should be a task of the braille table. In the Dutch 8 dot table for example, we ignore it completely.
In the Dutch 8 dot table for example, we ignore it completely.
This is imho highly unwanted for the reasons I mentioned above because you cannot check a correct position of the SHY character if you cannot use TTS at the same time. And TTS will here also work only correct if you navigate character by character which is time consuming. Thus not really comfortable.
As you already know me in such situations: The end user should have the force to change this behavior – not only (liblouis) devs for them. And issue #10634 also handles additional Unicode characters, which should be ignored in braille and speech output at the same way. So it's easier to add SHY (U+00AD) to that list as well.
Please, ignore SHY. For German we need lots of shoft-hyphens. In many projects we need automated hyphenation which escalates this problem.
The issue is fixed when setting "Punctuation/symbol level" to "some" in the Speech settings. I use NVDA version 2020.4.
I don't understand why removing soft-hpyhens is not desirable. Normally it is invisible and shuld not be announced. And if it is shown it should IMHO not be announced either. It conveys absolutely no information related to the contents.
Removing soft hyphens is desirable.
NVDA ignores and doesn't announce soft hyphens when the user has set "Punctuation/symbol level" to "none" or "some" in the NVDA Speech settings. "some" is the default. NVDA announces "soft hyphen" for each soft hyphen when the user has set "Punctuation/symbol level" to "most" or "all" in the NVDA Speech settings.
An NVDA user might want to verify the correct positioning of the soft hyphens and therefore needs an option to make NVDA announce them.
Microsoft Word has a similar setting, the Show/Hide Paragraphs option: Word's help explains: "Show paragraph marks and other hidden formatting symbols. This is especially useful for advanced layout tasks." This option shows optional hyphens. The fact that this option exists proves that there are valid use cases for revealing hidden symbols, for example proofreading including formatting symbols.
@julianladisch: Which TTS synthesizers are you using? And which languages?
An NVDA user might want to verify the correct positioning of the soft hyphens and therefore needs an option to make NVDA announce them.
That makes sense. My fault that I didn't think of that use case.
So it boils down to the question whether soft hyphens should be announced with setting "most" or only in "all"? Or they could get their own settings. After all proof-reading is probaly not what users do all the time.
Just a quick reminder: this issue is NOT about the announcement of the "shy" character. It is about the odd pronounciation of the whole words, where "shy" is used.
cc: @michaelDCurran
Thank you for clarification.
The steps to reproduce should be extended:
Disable soft hyphen pronunciation in the punctuation/symbols level settings and the Symbol Pronunciation settings. This works, NVDA doesn't say "soft hyphen".
The "Actual behavior" should be:
The pronunciation is the same as if each soft hyphen were replaced by a space. NVDA incorrectly pronounces
pro­nun­cia­tion
likepro nun cia tion
. NVDA incorrectly pronounces each syllable as a separate word. NVDA incorrectly pronounces thecia
syllable asCIA
(Central Intelligence Agency).
I confirm this bug.
The behavior I'm seeing is this:
- you can change the symbol level in the symbol dic to whatever you want, this should help with when you need to proofread
- real issue is that any symbol always causes a word break, even if it's replacement is set to the null string
- the behavior we'd like, especially for the soft hyphen case, is that when we set replacement to the null string, NVDA processes the entire word as if the symbol doesn't exist at all (just uses it's default word break characters)
We could enhance this to only omit the symbol completely when certain conditions are met, such as level is none or character and "send symbol to synthesizer" is set to never.
No progress on this? Soft hyphens have been around for ages and are a must-have for many languages - ok, for German at least. Just because English tends to have short words the problem should not be dismissed. Maybe it isn't, after all the issue has not been closed.
See also #13668
Same in Finnish and Swedish @masi , this is a bit need and really against all specs that they are pronounced.
Been hangin on to all hope that we can use soft hyphens which are so important while also maintaining our accessibility requirements that are surely so important for other Europeans at this point due to the new EC directive and all the languages with long compound words.
VoiceOver does it great, doesn't that put the fire under you to improve this product that so many people rely on? 🔥 😉
To summarize this issue, I think to bring this further, we need to do the following:
- Change the level for the soft hyphen to character
- Ensure that when level is character and send to synthesizer is never, the character is ignored completely when not reading by character.
I'd personally leave braille out of the discussion for now, though my standpoint is still that this is the translator's responsibility, otherwise we're very likely getting into routing issues.
Hmm, if the soft character is still navigable in character by character navigation, this will also affect the word by word navigation still. I think it might be worth thinking about a checkbox in the browse mode settings to ignore soft hyphens completely when navigating through the virtual document. It is a small additional setting I know, but it seems to have big impact.
Adding an extra option to browse mode settings isn't as impactless as you may think. Filtering characters from TextInfo is never trivial, even not with browse mode. Furthermore, character navigation should represent reality. If there is a soft hyphen in the text, I want to see that with character nav, just because it is there.
How is this displayed visually? Are there any visual spaces instead of the hyphen themselves? If yes, I agree with you. But still the UX will be confusing when people navigate word by word while the word is splited into several parts.Von meinem iPhone gesendetAm 31.05.2024 um 17:47 schrieb Leonard de Ruijter @.***>: Adding an extra option to browse mode settings isn't as impactful as you may think. Filtering characters from TextInfo is never trivial, even not with browse mode. Furthermore, character navigation should represent reality. If there is a soft hyphen in the text, I want to see that with character nav, just because it is there.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
In computing and typesetting, a soft hyphen (Unicode U+00AD SOFT HYPHEN ()) or syllable hyphen, is a code point reserved in some coded character sets for the purpose of breaking words across lines by inserting visible hyphens if they fall on the line end but remain invisible within the line.
Source: https://en.wikipedia.org/wiki/Soft_hyphen
See also my previous comments, e.g. https://github.com/nvaccess/nvda/issues/9343#issuecomment-469159685, https://github.com/nvaccess/nvda/issues/9343#issuecomment-706638240, https://github.com/nvaccess/nvda/issues/9343#issuecomment-707299234 and https://github.com/nvaccess/nvda/issues/10634#issuecomment-566758080