NVDA ignores line breaks in PDFs, making some types of text like source code unreadable with Adobe Reader
Continuation of #7275
Steps to reproduce:
Download this file and try to read it with NVDA in Adobe Reader
Actual behavior:
This is NVDA's speech output on the third line
class Node: def __init__(self, value):
Expected behavior:
Visually (and obvious from context), this should be two separate lines, like so:
class Node:
def __init__(self, value):
These are two separate lines visually in the file.
From https://github.com/nvaccess/nvda/issues/7275#issuecomment-308015865
PDF has semantic tags for paragraphs, lists, tables and the like. However, it does not differentiate author inserted line breaks (as in source code or poetry, sometimes known as hard line breaks) from line breaks used to wrap text which cannot fit on a single line (sometimes known as soft line breaks). Because NVDA splits text into lines itself (according to the "Maximum number of characters on one line" Browse Mode setting), we strip line break characters, as otherwise, you end up with a lot of long lines followed by short lines (as I recall happened in JAWS when I used it years ago). Having spoken to someone involved in PDF accessibility specification writing, my understanding is that the correct way to author such content is to tag each line as a separate list item or paragraph. Unfortunately, it seems no one actually does this in the wild. I think the only way we could reasonably solve this is to ignore NVDA's own settings for splitting lines and instead use only the line breaks in the PDF. That would also require us to not treat line breaks as paragraphs for PDF. This would be somewhat inconsistent with browse mode everywhere else, but I think consistency is probably outweighed by usability here.
NVDA logs, crash dumps and other attachments:
System configuration
NVDA installed/portable/running from source:
Installed
NVDA version:
alpha-34198,67f6cb99 (2025.1.0.34198)
Windows version:
Windows 11 23H2 (OS Build 22631.4317)
Name and version of other software in use when reproducing the issue:
Adobe reader 2024.003.20180
Other information about your system:
Other questions
Does the issue still occur after restarting your computer?
Yes
Have you tried any other versions of NVDA? If so, please report their behaviors.
Yes, this has been an issue since 2017
If NVDA add-ons are disabled, is your problem still occurring?
Yes
Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?
Yes
Should this be P2 instead? I would imagine that reading PDFs with Adobe Reader is somewhat common.
Well the PDF is not tagged at all. If I produce a tagged version (attached) with a current lualatex then line breaks are inserted. In the speech-viewer I get then
Here is a linked list node for the following questions. The empty linked list will be represented as
None.
class Node:
def __init__(self, value):
self.value = value
self.next = None
In the PDF the code lines start with real space chars, but sadly they are ignored and so the indentation (which can be meaningful in code) is lost.
The tagging is not the final version LaTeX will use, we are waiting on a verapdf update that would allow us to use Code for the code part and Sub for the single lines.
I recently opened a tagged copy of a bank statement and experienced this issue. Thinking it was an issue with my bank's automated system, I then used Microsoft Word for Microsoft 365 Version 2504 to create a tagged test PDF with hard line breaks, i.e., Shift+Enter. I then used Adobe Acrobat Reader DC 2025.001.20531 to open the file.
Obviously I don't expect NV Access to fix the PDF authoring tool to use proper semantics, but JAWS identifies the line breaks, OCR confirms that the line breaks exist, and so I would expect that NVDA would respect that semantic information, just as it does for the web in browse mode. The HTML equivalent of this, using NVDA's browse mode, would be something like:
data:text/html,<p>This is the first paragraph. It has no line breaks and is relying on the reflowing as set by the Acrobat software.</p><p>This second paragraph has line breaks <br/>in it to make sure that Acrobat and NVDA <br/>are working well together</p>