nvda icon indicating copy to clipboard operation
nvda copied to clipboard

Add ability to read math in PDF documents

Open NSoiffer opened this issue 1 year ago • 8 comments

Link to issue number:

Fixes https://github.com/nvaccess/nvda/issues/9288

Summary of the issue:

PDF 2.0 added Associated Files (AF). It also describes a method for Formula tags to make use of AF that contain MathML. The LaTeX Project (the group that maintains LaTeX) has released an update to LaTeX that uses this technique. Hence, there will soon be a large body of PDF documents generated from LaTeX (pdflatex and lualatex) that contain MathML.

In conjunction with Foxit and an informal agreement with someone at Adobe, we agreed on a method to expose the MathML in an AF without a change to the PDF accessibility interface: the Formula tag gets role=Math (in windows, ROLE_SYSTEM_EQUATION) and the contents of tag is the MathML.

Note: this does not change the legality of the previous method of fully tagging the PDF math with children elements pointing to subexpressions in the PDF. However, that method has proved difficult to implement for PDF generators. This method seems to be much simpler and hence will be used.

The latest release of Foxit contains the support of AF with MathML. So far, Adobe has not made a change but with Foxit and NVDA supporting this, there will be more of an impetuses to do so. According to the Foxit implementer, it only took 1-2 days to implement.

Description of user facing changes

The math in documents will be spoken and brailled just as it is done for HTML documents. It will also be navigable. This should work with any of the MathML add-ons.

Description of development approach

Support required only about 3 lines to be added to the AdobeAcobat.py file. I changed a few more lines to add debug warnings when various COM interfaces were not found.

There was a commit in January 2024 that wiped out the MathML support in PDF in favor of alt text. This was in the .cpp file that is part of this PR. This PR mostly reverts that change. Alt text is still supported via the creation of a MathML <mtext> element. Potentially, this is a better solution because sometimes the alt text is LaTeX and LaTeX contains lots of punctuation characters that are not spoken by NVDA by default. Pushing this to the Math handler gives them the ability to override this behavior and speak all the characters. Currently MathCAT just passes the mtext content directly to NVDA, but I will look into making it smarter about that.

Because Adobe Reader currently does not handle AFs, the alt text will get read if a formula has both an AF and alt text.

Testing strategy:

Here are two PDF files for testing:

  1. Several inline and display equations
  2. Some equations with alt text

Known issues with pull request:

None

Code Review Checklist:

I need some guidance on what to do, if anything, about "change log entry" and unit or system tests.

  • [x] Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • [x] Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • [x] UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • [x] API is compatible with existing add-ons.
  • [x] Security precautions taken.

@coderabbitai summary

NSoiffer avatar Oct 10 '24 22:10 NSoiffer

This sounds so beautiful! @NSoiffer, thanks a lot to follow these evolutions. I ended my studies in 2019, and never had MathML materials; but it's very comforting to know that, in the future, I could find accessible PDF almost as easily as a sighted person.

ABuffEr avatar Oct 11 '24 07:10 ABuffEr

@NSoiffer Just curious, is this new support part of the base amspath package or does it require an inclusion of an additional package at build time?

codeofdusk avatar Oct 11 '24 22:10 codeofdusk

As for "change log entry", add a new item in user_docs/en/changes.md (similar to the others). Include the undelying issue number and your username as an at mention.

It looks like NV Access might need to manually start CI for this PR. Be sure that existing unit and system tests pass.

codeofdusk avatar Oct 11 '24 22:10 codeofdusk

@NSoiffer Just curious, is this new support part of the base amspath package or does it require an inclusion of an additional package at build time?

This is separate, although it does include support for the amsmath package. It is a project that has been worked on for 4(?) years, funded by Adobe that is nearing completion of phase V (their last listed phase). Here is a relatively recent paper on their work. It (currently) requires adding some metadata at the start of the file:

\DocumentMetadata{
  lang        = en,
  pdfversion  = 2.0,
  pdfstandard = ua-2,
  pdfstandard = a-4f, %or a-4
  testphase   = 
   {phase-III,
    title,
    table,
    math,
    firstaid}  
}

The first part of the metadata will be required as it states some important things that go into PDF's metadata. I suspect the "testphase" part will not be required at some point in the future.

More info about the project is on their web page and also in their repo (it is a little out of date -- a month ago or so they started automatically generating MathML from TeX).

NSoiffer avatar Oct 12 '24 00:10 NSoiffer

@codeofdusk: thanks. I've added something to the userDoc file.

As for testing, the only code that I changed is PDF related. As far as I can find, there are no tests for PDF files. runsystemtests.bat -i "xxx" where "xxx" is "PDF", "Adobe", and "Acrobat" all come back with "Suite 'Robot' contains no tests matching tags". So no tests should have been broken.

Because the code does COM calls, I don't see a way to do Unit tests. I don't know how to do PDF system tests because one needs a PDF processor running (AdobeReader, Foxit Reader, ...). The need for a PDF processor probably makes system testing hard (impossible?). If someone can tell me how to do some tests, I'll add some. Otherwise, I think the checklist is complete.

NSoiffer avatar Oct 12 '24 03:10 NSoiffer

Having not heard any suggestions about how I could do testing, I've checked the box off. That means there are no tasks left. I think this is ready for review.

NSoiffer avatar Oct 14 '24 20:10 NSoiffer

  • Build (for testing PR): https://ci.appveyor.com/api/buildjobs/0e8xt8ak3i7n12qm/artifacts/output/
  • CI timing (mins): INIT 0.0, INSTALL_START 1.4, INSTALL_END 0.9, BUILD_START 0.0, FINISH_END 15.2

See test results for failed build of commit b6a64ad505

AppVeyorBot avatar Oct 17 '24 08:10 AppVeyorBot

To be honest I wouldn't be opposed to documenting some technical details on how to use this feature in the user guide. Even if they look somewhat professional. Overall, those who wish to use a screen reader to read mathematical content will probably always need to know some technical knowledge.

cary-rowen avatar Oct 17 '24 23:10 cary-rowen

GitHub says that @SaschaCowley requested updates. I must be missing something -- as far as I can see there are no unresolved requests. Can someone clarify for me what is not resolved?

Thanks.

NSoiffer avatar Oct 24 '24 05:10 NSoiffer

GitHub says that @SaschaCowley requested updates. I must be missing something -- as far as I can see there are no unresolved requests. Can someone clarify for me what is not resolved?

Thanks.

This is just because I have not yet submitted an approving review. My apologies for the delay, I've been off sick for the last four days.

SaschaCowley avatar Oct 25 '24 01:10 SaschaCowley

  • PASS: Translation comments check.
  • PASS: License check.
  • PASS: Unit tests.
  • PASS: Lint check.
  • FAIL: System tests (tags: installer NVDA). See test results for more information.
  • Build (for testing PR): https://ci.appveyor.com/api/buildjobs/u626a9h32hjde57y/artifacts/output/l10nUtil.exe nvda_snapshot_pr17276-34412,b46a1cde.exe
  • CI timing (mins): INIT 0.0, INSTALL_START 1.4, INSTALL_END 0.9, BUILD_START 0.0, BUILD_END 25.0, TESTSETUP_START 0.0, TESTSETUP_END 0.4, TEST_START 0.0, TEST_END 18.4, FINISH_END 0.2

See test results for failed build of commit b46a1cde7a

AppVeyorBot avatar Oct 25 '24 02:10 AppVeyorBot