birt icon indicating copy to clipboard operation
birt copied to clipboard

PDF Tags (accessibility)

Open christophersavory opened this issue 2 years ago • 16 comments

I found this discussion regarding iText tags and accessibility in a forum from 2012. https://forums.opentext.com/forums/developer/discussion/52337/how-do-you-create-an-accessible-pdf-report-with-birt

I couldn't find anything regarding PDF accessibility in the BIRT Docs. https://eclipse.github.io/birt-website/docs/t_brief-editor-tour

Has PDF Emitter accessibility (tags) been implemented into BIRT? If not, is it on the roadmap?

christophersavory avatar Mar 16 '23 20:03 christophersavory

I asked the same question a few years ago. It is not on the roadmap. I think really many people world-wide would benefit from PDF/UA support.

PDFs generated for authorities MUST support this in the EU.

But nobody is willing to pay for it. And it will be a lot of work, so it cannot be done by an enthusiastic hobby programmer.

Where are the big companies, where are the states, where is the EU? IMHO this is a topic where it is necessary to actually pay for open source development.

From a technical point of view:

BIRT uses OpenPDF for PDF creation. AFAIK OpenPDF does not support creating Tagged PDF which is a precondition for PDF/UA. See https://github.com/LibrePDF/OpenPDF/issues/181

The BIRT community can only start developing PDF/UA support for BIRT after OpenPDF added support for it.

At least one tiny bit of preparation is included in BIRT: There is a PDF tag type property. grafik This was certainly meant to assist creating tagged PDFs. But AFAIK this property isn't actually used.

I don't know: Did the commercial BIRT product support creating tagged PDFs?

hvbtup avatar Mar 17 '23 09:03 hvbtup

It seems like JasperReports supports creating PDF/UA. Internally, JasperReports uses OpenPDF just like BIRT (a patched version at the moment, see https://github.com/LibrePDF/OpenPDF/pull/765)

So, in contrast to what I said in my previous comment:

It would certainly be possible to create PDF/UA with in BIRT.

It's just that the OpenPDF community itself is not focused on this and doesn't provide examples. But if JasperReports can do it, why shouldn't we?

Still, this is certainly a lot of work.

hvbtup avatar Mar 23 '23 13:03 hvbtup

It looks like OpenPDF might already support PDF/A-1a and PDF/A-1b

See lines 1738 and 1740 of https://github.com/LibrePDF/OpenPDF/blob/3b38ad8588669d24fd1f772ec10bb516e996e3c1/openpdf/src/main/java/com/lowagie/text/pdf/PdfWriter.java

Do we just need to create a new EMITTER_ID that will set the correct PDFXConformance on the PDFWriter?

christophersavory avatar May 12 '23 17:05 christophersavory

Any updates on implementing tagged PDF functionality? Has anyone made progress on this?

MayurDeore avatar Aug 23 '23 14:08 MayurDeore

I'm on vacation this week and spent some time investigating yesterday. I was able to create a valid PDF/UA 1 document using the rather low-level functions of OpenPDF (validation done with the PAC 2021 validator). The idea is that you create a structure tree containing the logical structure of the content (similar to basic HTML) and you link every content on the pages to one of these structure elements. Basically, the structure elements correspond to the item instances. So I really think it is possible to add this to BIRT. But there is a lot to think about. E.g. do we need to distinguish between locale and language; and while it is typical that a BIRT corresponds to a specific tag in the structure tree, this is not always the case, so this must be configurable.

hvbargen avatar Oct 04 '23 08:10 hvbargen

Reminder to myself: ATM the example is saved to a private GH repo.

hvbargen avatar Oct 04 '23 08:10 hvbargen

@hvbargen may I ask you to share your experimental code, please, if possible? I'm doing some similar attempts to integrate PDF/A tags to the custom PDF emitter, and this will be really helpful.

luzhanov avatar Oct 04 '23 14:10 luzhanov

OK, here it is:

https://github.com/hvbargen/openpdf-ua

As I said, I'm convinced that it is possible to add PDF/UA (and PDF/A) support to BIRT. But this cannot work as a one-man-show. We need people who can help specifying, programming, testing. Anyway, this can only start after 4.14.

hvbargen avatar Oct 04 '23 18:10 hvbargen

Thank you @hvbtup, this is really helpful! I can see that my approach is similar to yours, I'm trying to do with BIRT PDF emitter - first I'm adding the tag structure in similar way, but the most tricky part is to tag content itself (every text entry, image etc).

Currently we're using Birt 4.9.0 as far is the latest pom-based version we can integrate in our application. This issue is blocking us from moving to newest versions: https://github.com/eclipse-birt/birt/issues/625

luzhanov avatar Oct 05 '23 16:10 luzhanov

Is this solved with the latest PDF patches?

wimjongman avatar Sep 16 '24 08:09 wimjongman

image

wimjongman avatar Sep 16 '24 08:09 wimjongman

I started working on a proof of concept during my vacation with my hobby account.

I copied this to our companies fork today, so this is available at https://github.com/triestram-partner/birt/tree/tagged-pdf, in a very, very early stage.

I am able to create a PDF/UA file consisting of labels only (without any background colors or lines etc) which PAC 2024 accepts.

Currently I'm trying to add support for images, but PAC 2024 still reports one error for the resulting file.

The POC is demonstrating that it is in fact possible to generate PDF/UA with a modified version of BIRT and OpenPDF as the backend. But there are lots of open issues. For example, I expect major challenges when it comes to page breaks and to HTML dynamic text items. It might turn out that the simple concept of the implementation is not sufficient and we must start from scratch again, but I try to be optimistic. That said, there are also a lot of tasks which should be relatively easy, but all of this certainly amounts to several weeks or months of work before it even can be beta-tested.

This is definitely not something I (or anyone else) can develop all alone as a a hobby project, so I'm asking other developers for help.

For this, I think it would make sense to create a branch tagged-pdf in the official repository. @wimjongman: Any objections?

hvbtup avatar Sep 16 '24 09:09 hvbtup

No objections from me, I'm fine with a shared branch.

wimjongman avatar Sep 16 '24 09:09 wimjongman

OK, folks, help is welcome.

The branch is https://github.com/eclipse-birt/birt/tree/tagged-pdf

To test the branch, create a report with the following properties:

Report properties:

  • Set a locale like de-DE or en-US in the advanced properties

  • Specify a title.

  • In the PDF emitter options, select

    • PDF version: 1.7
    • PDF conformance: PDF/A-1A
    • PDF/UA conformance (or version?): 1 (let's start with PDF/UA-1 support, PDF/UA-2 support can be added later)

Not sure, there may more settings necessary (regarding fonts), I'll upload an example report later when I'm at home.

For the layout elements, be sure to add appropriate PDF tags like P or H1.

Choose run report ... as PDF from the menu in the all-in-one-designer, download the PDF from your browser. Use PAC 2024 or another tool to validate the PDF.

hvbtup avatar Sep 16 '24 12:09 hvbtup

This is copied from the commit...


I would really much prefer that you push changes to your own fork and create pull requests to kick off builds rather than push branches into the main repository.

The workflow for how to do that is described here:

https://github.com/orgs/eclipse-simrel/discussions/3

I.e., I don't expect to see a bunch of "personal" branches being built:

image

but rather PR builds:

image

Via the PRs there is a place to review and discuss changes.

You can easily amend your commits and force push to your fork to kick off new builds. That's also described in the discuss above as well.

Your build is failing because it isn't based on master and hence has an older target platform referring to a p2 repository that no longer exists.

Before starting a PR effort, always pull master and create your branch off master...

merks avatar Sep 16 '24 12:09 merks

This is an example report which only produces one error when checked with PAC 2024: text_and_image_example.zip

hvbtup avatar Sep 17 '24 08:09 hvbtup

I just want to inform you that I made some good progress in the past few days.

See the branch pdf-tag-page-break in the fork for my hobby account.

I can now create tables and lists that are longer than one page including a valid tag structure. This even works for tables with captions (I never noticed this property in the advanced settings, and it seems to be quite forgotten generally, for example there is no predefined style for it).

However, there is still very much to test and to code:

  • I did not test splits of table rows (that includes splits of cells).
  • I did not test splits of plain text or HTML text.
  • I did not test grids and I don't know how "forms" should be handled regarding the strucure. Many of our report use a grid to show properties, e.g. for an order, with property name and property value, e.g. "Order no: 24-12345".
  • I am not interested in charts.
  • I am unsure how to handle HTML dynamic text (in particular, headings).
  • I don't have a screen reader, only PAC 2024.
  • While the structure is working for multi-page tables, I am not quite satisfied with the way I have coded this. I needed half a dozen new attributes for ContainerArea instances, which of course adds some bytes to the memory needed per instance and there are lots of such instances. This information is only needed when PDF/UA is created, so I think it would be better to move this into a separate object that is only allocated when actually needed.

hvbtup avatar Dec 05 '24 11:12 hvbtup

Thanks for the update, Henning. It sounds like you are making great progress. Wonderful!

wimjongman avatar Dec 05 '24 11:12 wimjongman

Partial support for PDF/UA comes with BIRT 4.19 (see #2018).

Some example reports are in UI/org.eclipse.birt.report.designer.samplereports/samplereports/Reporting Feature Examples/Accessibility

"Your mileage will vary."

For example, charts are not supported.

PRs for improvements, e.g. adding PDF/UA support for charts or crosstabs, are welcome.

hvbtup avatar Feb 11 '25 09:02 hvbtup

Closing this, as 4.19 allows generating PDF/UA.

hvbtup avatar Feb 12 '25 08:02 hvbtup