nvda icon indicating copy to clipboard operation
nvda copied to clipboard

Tab characters should be considered blank

Open SamKacer opened this issue 2 years ago • 11 comments

Steps to reproduce:

Read the two lines between "start" and "end":

start
    
		
end

(Note: The first line after start has 4 spaces and the next line has 2 tabs)

Actual behavior:

They are read as:

start
blank
*nothing is read*
end

Further, if indentation reporting is on, then they are read as:

start
4 space blank
2 tab blank
end

Expected behavior:

I would expect the line with the 2 tabs should be read as "blank" when indentation reporting is turned off, since it is a blank line when indentation reporting is turned on. Nothing being read at all causes confusion when reading code, since it can seem as if NVDA has stopped responding.

System configuration

NVDA installed/portable/running from source: installed

NVDA version: 2021.3.3

Cause of problem

In speech.py, whether text is considered blank is determine by this code:

#: If a chunk of text contains only these characters, it will be considered blank.
BLANK_CHUNK_CHARS = frozenset((" ", "\n", "\r", "\0", u"\xa0"))
def isBlank(text):
	"""Determine whether text should be reported as blank.
	@param text: The text in question.
	@type text: str
	@return: C{True} if the text is blank, C{False} if not.
	@rtype: bool
	"""
	return not text or set(text) <= BLANK_CHUNK_CHARS

The tab character is missing from BLANK_CHUNK_CHARS , which is why it isn't considered blank. The solution would be to also include it in the set.

Not sure if there is something I am missing and this is actually the intended behavior, but the inconsistency between the line being blank when indentation reporting is on/off makes me think it is a bug.

If it is agreed this is a bug, I would be happy to provide the PR as my first contribution to the NVDA project.

SamKacer avatar Mar 05 '22 13:03 SamKacer

Accidently submitted before writing anything, editing it now...

SamKacer avatar Mar 05 '22 13:03 SamKacer

Edited it now with full context

SamKacer avatar Mar 05 '22 13:03 SamKacer

I realized one reason why tab is probably not in BLANK_CHUNK_CHARS. When symbol level is set to All, tab is read, however the characters in Blank_chunk_chars are never read at any symbol level.

I experimented with adding tab to BLANK_CHUNK_CHARS and it does cause the line with tabs to be read as "blank", however when symbol level is All, the line is still read as "blank", even though the expected behavior would be to read the tabs.

Because of this, I no longer believe the correct solution is to add tab to BLANK_CHUNK_CHARS, but I also believe the behavior I described in the first comment needs to be fixed, since I don't think NVDA should never just speak nothing when all the characters in a text are not in the current symbol level

I think the correct solution could be to make it so, when an empty speech sequence is produced because none of the characters are read in the current symbol level, then the empty speech sequence should be changed to "blank"

SamKacer avatar Mar 05 '22 15:03 SamKacer

Experimenting with the idea that if text is empty after applying dictionary substitutions and symbol level, then it should be read as "blank", I achieved the desired behavior when I altered speech.speech.processText like this:

 def processText(locale,text,symbolLevel):
 	text = speechDictHandler.processText(text)
 	text = characterProcessing.processSpeechSymbols(locale, text, symbolLevel)
 	text = RE_CONVERT_WHITESPACE.sub(u" ", text)
-	return text.strip()
+	text = text.strip()
+	return text if text else _("blank")

With that experimental change, a line with just tabs is read as "blank" when symbol level is not All and as "tab tab" when symbol level is All. Also aline with other punctuation that is only read at a certain symbol level is also read as "blank" if symbol level is switched to something lower.

There is one remaining somewhat unexpected behavior. If indentation reporting is on, the two tab line is read as "2 tab blank" or "beep blank", whereas when indentation reporting is off, it is read as "tab tab". Perhaps although slightly unintuitive, it might actually be the prefferential behavior.

Either way, I think the changes I proposed with a little cleanup would improve over the current behavior. Please let me know what you think and I can then create a PR

SamKacer avatar Mar 05 '22 16:03 SamKacer

Hi @SamKacer Since you have already a precise idea on how to implement a fix for this issue, would you mind submitting a pull request for it?

CyrilleB79 avatar Mar 09 '22 07:03 CyrilleB79

cc @SamKacer

cary-rowen avatar Mar 09 '22 11:03 cary-rowen

@CyrilleB79 yes, glad to help out and provide the PR. When I next have the time, I will put it together

SamKacer avatar Mar 09 '22 18:03 SamKacer

This should be closed I think, as this would apparently break add-ons, as seen in #14102

TheQuinbox avatar Sep 06 '22 17:09 TheQuinbox

@TheQuinbox - a way to avoid breaking backwards compatibility is described in that same comment

seanbudd avatar Sep 07 '22 00:09 seanbudd

@seanbudd, the main suggestion I took from that is a config option, and I fail to see how that would fix it, as add-ons would still possibly break for some people if they toggled the setting. A .1 release though, possibly?

TheQuinbox avatar Sep 07 '22 00:09 TheQuinbox

Hello, I don't think this should be closed just yet. Since, just because that approach was potentionally addon breaking, doesnt mean this issue cant be resolved with a different approach in a backwards compatible way, which I think it can be in my PR after I make the requested changes.

Apologies, I have fallen off with work on this due to personal reasons. I will try to get back to this soon, make the changes for my PR and take this forward

On Wed, Sep 7, 2022 at 1:46 AM Quin @.***> wrote:

@seanbudd https://github.com/seanbudd, the main suggestion I took from that is a config option, and I fail to see how that would fix it, as add-ons would still possibly break for some people if they toggled the setting. A .1 release though, possibly?

— Reply to this email directly, view it on GitHub https://github.com/nvaccess/nvda/issues/13431#issuecomment-1238785824, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIOBHCKO7V6MNGAT4LZDYS3V47QWBANCNFSM5P7VZ7RA . You are receiving this because you were mentioned.Message ID: @.***>

SamKacer avatar Sep 07 '22 15:09 SamKacer

Reverted in #15032.

CyrilleB79 avatar Jun 20 '23 07:06 CyrilleB79

Continuing the conversation from #15024:

@SamKacer I am not so very opposed to a three-way configurable option for "When a line contains only unspoken punctuation" as @leonardder is, provided that the options at least include:

  • Don't touch/silence (default)
  • Speak the punctuation regardless of level in addition to your "Speak as blank line" option. It should also be profile aware.

And I think, if NV Access wants this, you need a separate option for these same issues in regards to lines containing only whitespace (i.e. #13431), as the situations are not identical.

"not so very opposed" doesn't mean I like it or recommend it, but I do recognize there are probably enough users who want it, at least for the whitespace case, that it may not be the worst thing ever provided it's configurable and not the default. It does have certain unfortunate parallels to what users can do with speech dictionaries. But it just feels like introducing "lie mode".

The last I will probably say on this, is that personally I would much prefer you wait until we get a filter extensionPoint implemented for speech.speech.speak. Or do that yourself before trying this again.

XLTechie avatar Jun 20 '23 08:06 XLTechie

To clarify, the case of tab (i.e. is the subject of this particular issue) doesn't easily generalize to all types of symbols/punctuation. I think it is perfectly valid to treat tabs as blank, as a line containing only tabs is blank from a visual perspective. To summarize what I said earlier, speaking symbols like brackets, underscores, etc. as blank when they are the only symbols on a line is a no go for me, even when optional. A replacement like "no text" is longer but much better covers the actual situation and is therefore perfectly valid. Thus, it's only just the invalidity of the concept of blankness that bothers me.

LeonarddeR avatar Jun 20 '23 08:06 LeonarddeR

I am not in favor of adding one or more parameters in the GUI until we are confident that a significant group of users will use each option of these parameters. Else, the number of parameters in the GUI risks to grow indefinitely. For more specific use cases, I'd prefer a separate add-on.

IMO by default, the lines with nothing or only blank characters (spaces, newline, tabs, nul char) should be reported as blank as they are in Narrator or in Jaws. Today "tab" is missing in the list of blank characters. Of course, care should be taken with the following situations:

  • The user has lowered the symbol level of some blanks symbols (e.g. non breaking spaces or tabs) so that they are reported at the current punctuation level
  • Reporting of indentation, no matter the punctuation level used

I think that providing a PR with what I have just described (even with no configurable option at all) could be a first step to solve this issue. Then we will be able to figure if people (and how many) need something more advanced. This step-by-step approach would also avoid to revert the whole work whereas a part of the work is completely valid and agreed by all.

CyrilleB79 avatar Jun 20 '23 08:06 CyrilleB79