nvda icon indicating copy to clipboard operation
nvda copied to clipboard

NVDA ignores line breaks in PDFs, making some types of text like source code unreadable with Adobe Reader

Open Neurrone opened this issue 1 year ago • 3 comments

Continuation of #7275

Steps to reproduce:

Download this file and try to read it with NVDA in Adobe Reader

Actual behavior:

This is NVDA's speech output on the third line

class Node: def __init__(self, value): 

Expected behavior:

Visually (and obvious from context), this should be two separate lines, like so:

class Node:
  def __init__(self, value): 

These are two separate lines visually in the file.

From https://github.com/nvaccess/nvda/issues/7275#issuecomment-308015865

PDF has semantic tags for paragraphs, lists, tables and the like. However, it does not differentiate author inserted line breaks (as in source code or poetry, sometimes known as hard line breaks) from line breaks used to wrap text which cannot fit on a single line (sometimes known as soft line breaks). Because NVDA splits text into lines itself (according to the "Maximum number of characters on one line" Browse Mode setting), we strip line break characters, as otherwise, you end up with a lot of long lines followed by short lines (as I recall happened in JAWS when I used it years ago). Having spoken to someone involved in PDF accessibility specification writing, my understanding is that the correct way to author such content is to tag each line as a separate list item or paragraph. Unfortunately, it seems no one actually does this in the wild. I think the only way we could reasonably solve this is to ignore NVDA's own settings for splitting lines and instead use only the line breaks in the PDF. That would also require us to not treat line breaks as paragraphs for PDF. This would be somewhat inconsistent with browse mode everywhere else, but I think consistency is probably outweighed by usability here.

NVDA logs, crash dumps and other attachments:

System configuration

NVDA installed/portable/running from source:

Installed

NVDA version:

alpha-34198,67f6cb99 (2025.1.0.34198)

Windows version:

Windows 11 23H2 (OS Build 22631.4317)

Name and version of other software in use when reproducing the issue:

Adobe reader 2024.003.20180

Other information about your system:

Other questions

Does the issue still occur after restarting your computer?

Yes

Have you tried any other versions of NVDA? If so, please report their behaviors.

Yes, this has been an issue since 2017

If NVDA add-ons are disabled, is your problem still occurring?

Yes

Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?

Yes

Neurrone avatar Oct 20 '24 14:10 Neurrone

Should this be P2 instead? I would imagine that reading PDFs with Adobe Reader is somewhat common.

Neurrone avatar Oct 28 '24 04:10 Neurrone

Well the PDF is not tagged at all. If I produce a tagged version (attached) with a current lualatex then line breaks are inserted. In the speech-viewer I get then

Here is a linked list node for the following questions. The empty linked list will be represented as
None.
class Node:
def __init__(self, value):
self.value = value
self.next = None

test-verbatim.pdf

In the PDF the code lines start with real space chars, but sadly they are ignored and so the indentation (which can be meaningful in code) is lost.

The tagging is not the final version LaTeX will use, we are waiting on a verapdf update that would allow us to use Code for the code part and Sub for the single lines.

u-fischer avatar Oct 31 '24 18:10 u-fischer

I recently opened a tagged copy of a bank statement and experienced this issue. Thinking it was an issue with my bank's automated system, I then used Microsoft Word for Microsoft 365 Version 2504 to create a tagged test PDF with hard line breaks, i.e., Shift+Enter. I then used Adobe Acrobat Reader DC 2025.001.20531 to open the file.

Obviously I don't expect NV Access to fix the PDF authoring tool to use proper semantics, but JAWS identifies the line breaks, OCR confirms that the line breaks exist, and so I would expect that NVDA would respect that semantic information, just as it does for the web in browse mode. The HTML equivalent of this, using NVDA's browse mode, would be something like:

data:text/html,<p>This is the first paragraph. It has no line breaks and is relying on the reflowing as set by the Acrobat software.</p><p>This second paragraph has line breaks <br/>in it to make sure that Acrobat and NVDA <br/>are working well together</p>

tmthywynn8 avatar Jun 14 '25 22:06 tmthywynn8