WeasyPrint icon indicating copy to clipboard operation
WeasyPrint copied to clipboard

Declarative way to add accessibility tags to PDFs

Open codekiln opened this issue 4 years ago • 16 comments

In many industries, accessibility is a big deal. It's impossible for a PDF to be considered accessible unless it's "tagged":

  1. WebAIM's guide to accessible PDFs
  2. Section 508's guide to creating accessible PDFs
  3. Acrobat's Accessibility Portal
  4. U of MN's guide to Accessible PDFs
  5. WebAIM mailing list archives that mention WeasyPrint

If accessibility tagging was added to weasyprint, it would be an ace in the hole for many industries. The current state of the art seems to be to do a whole lot of clicking in acrobat.

If the input HTML had certain special attributes, weasyprint could apply the equivalent accessibility tags.

codekiln avatar Mar 25 '20 19:03 codekiln

It would be possible if we had our own PDF generator or post-processor. If someone is interested in replacing Cairo…

liZe avatar Apr 02 '20 12:04 liZe

@liZe

  1. how much work would you think it is to replace Cairo ?
  2. to what you would suggest to replace it to ?

malnajdi avatar Apr 07 '20 14:04 malnajdi

  1. how much work would you think it is to replace Cairo ?

A lot (see #841).

  1. to what you would suggest to replace it to ?

A pure-Python library generating PDF. I’m not fond of reportlab, but something like that should be OK.

(I’m currently reading the PDF spec to write such a library, but that’s a secret :wink:.)

liZe avatar Apr 07 '20 16:04 liZe

This sounds very interesting. I would really appreciate the effort to implement the tagging of PDF documents.

NicolasGoeddel avatar Jul 30 '20 15:07 NicolasGoeddel

Dear Lize!

We also now have the requirement of a tagged and accessible PDF. After some research I found: https://www.cairographics.org/news/cairo-1.16.0/

The PDF backend has gained support for .. tags. Tags permit adding logical info such as headings, tables, figures, etc. that facilitates indexing, accessibility, text reflow, searching, and extraction of the tagged items to other software. For details on this new PDF functionality, see: https://lists.cairographics.org/archives/cairo/2016-June/027427.html

And directly in the Cairo docs: https://www.cairographics.org/manual/cairo-Tags-and-Links.html#doc-struct

It does seem to me, that Cairo does indeed support the required structural tagging.

As a standard HTML uses all these tags allready, it should be quite straight forward to map those to the appropriate PDF-Tags?

Let me know, if we can assist in any way to help this issue along!

As always, thanks a lot for your great library!

Johannes

JohannesMunk avatar Aug 08 '20 09:08 JohannesMunk

It does seem to me, that Cairo does indeed support the required structural tagging.

It does, you’re right!

It means that it could be possible to add tags using Cairo. I don’t think that there’s currently an easy way to do this using only the public API of WeasyPrint, even with the new finisher option. We’ll drop Cairo soon, but it will be possible to do this with another library too.

If anyone wants to work on this, I can help!

liZe avatar Aug 09 '20 14:08 liZe

Great news!

  1. Without any promises, how soon will you be dropping Cairo?
  2. Assuming that 1) will not be happening in the next 4 weeks, if we supported your patreon campaign, would you be willing to tackle this issue still using Cairo?
  3. If not, I would welcome some ideas on where and how you would like to see me implement this..

JohannesMunk avatar Aug 09 '20 15:08 JohannesMunk

  1. Without any promises, how soon will you be dropping Cairo?

Next release will come in September (I hope), next one will be without Cairo (may take time for users to test and report broken corner cases).

2. Assuming that 1) will not be happening in the next 4 weeks, if we supported your patreon campaign, would you be willing to tackle this issue still using Cairo?

Big secret: we’re currently building a small structure dedicated to WeasyPrint and its dependencies (and misc free software), you shouldn’t give to the patreon campaign and wait for a the end of the month, we’ll have more time and more resources to implement the features you want :wink:.

3. If not, I would welcome some ideas on where and how you would like to see me implement this..

It will be easier without Cairo, as we’ll have a dedicated library to create PDF files, and won’t have the Cairo surface / PDF bytestring separation as it’s done now.

liZe avatar Aug 09 '20 19:08 liZe

Sounds great! Looking forward to a bright future, where pdf writing is fully under your control :-)

But I can't put our current project on that timetable. So I just looked into the code.. it was very straight forward to put a few calls to context.tag_begin and tag_end into draw.py with the appropriate mappings from element_tag. Works great! Checked with PAC3 and Adobe Acrobat. Next step will be nested sections and table structures.

Will keep you posted!

JohannesMunk avatar Aug 10 '20 16:08 JohannesMunk

So I just looked into the code.. it was very straight forward to put a few calls to context.tag_begin and tag_end into draw.py with the appropriate mappings from element_tag.

Would you be willing to share a snippet that explains how to implement this strategy @JohannesMunk ?

noelleleigh avatar Nov 30 '20 20:11 noelleleigh

With the Cairo dependency removed, are there any plans to accommodate Tag structures in Weasyprint/pydyf?

dariux avatar Jun 18 '21 15:06 dariux

any update on this feature? @JohannesMunk can you please share the changes in draw.py that you've made?

duklin avatar Aug 11 '21 10:08 duklin

any update on this feature?

Cairo has been removed and we now have a custom PDF writer, we technically have everything needed to add this feature. That’s not in the roadmap yet, but we’d be happy to get the different use cases users may have. Don’t hesitate to add comments in this issue if you want.

liZe avatar Aug 17 '21 14:08 liZe

Are there any news about this issue? We are really interested on this feature. Thanks!

julian-kappler avatar Apr 28 '22 11:04 julian-kappler

Are there any news about this issue? We are really interested on this feature. Thanks!

We would really like to work on this feature (that would be really nice for accessibility), but that’s quite complex and would require some time to chose the right API, define what’s actually supported and, of course, to implement the feature.

After version 55 is released (soon!), we have a lot of things to do in version 56 with a solid support of Flexbox and Grid, and some sponsored features awaiting. If some companies are interested in sponsoring this feature and get it earlier, don’t hesitate to get in touch!

liZe avatar Apr 28 '22 15:04 liZe

Version 57 will include PDF/UA support that includes accessibility tags automatically added out of the HTML structure of the document. If anyone is interested in this feature, don’t hesitate to test the current master branch and add a comment here!

liZe avatar Sep 16 '22 22:09 liZe

Hello @liZe ! Can you clarify how can I test PDF/UA support? Im using PAC free program for testing support PDF/UA.

How I can add this settings?

image

bandirom avatar Dec 02 '22 14:12 bandirom

Hello @bandirom, Did you set pdf-variant to pdf/ua-1 to enable PDF/UA generation?

grewn0uille avatar Dec 02 '22 15:12 grewn0uille

Hello @bandirom, Did you set pdf-variant to pdf/ua-1 to enable PDF/UA generation?

Probably no :(. Can you help with it. How to set it?

bandirom avatar Dec 02 '22 16:12 bandirom

Probably no :(. Can you help with it. How to set it?

If you’re using WeasyPrint by command line, you can set the --pdf-variant like this weasyprint --pdf-variant=pdf/ua-1 document.html document.pdf.

If you’re calling write_pdf() in your code, you can set the variant parameter to "pdf/ua-1".

grewn0uille avatar Dec 02 '22 16:12 grewn0uille

@codekiln @malnajdi @malnajdi @JohannesMunk @noelleleigh @dariux @duklin Now that PDF/UA includes accessibility tags, is that sufficient for your needs?

liZe avatar Dec 06 '22 13:12 liZe

Im testing my pdf file and Adobe show me almost good result. Im checking detected issues

image

bandirom avatar Dec 06 '22 16:12 bandirom

image

image

I tested a table. And I dont understand what is the empty boxes. If I manually removed this boxes tests were passed

Can you help?

UPD: even if I make table empty, like <table></table> I also catch this 2 boxes

bandirom avatar Dec 07 '22 12:12 bandirom

Im testing my pdf file and Adobe show me almost good result. Im checking detected issues

Thanks a lot for your report. Could you please open a new separate issue telling that tags are broken with tables (and a link to your comment, there’s no need to copy the text/images)?

liZe avatar Dec 23 '22 14:12 liZe

@codekiln @malnajdi @malnajdi @JohannesMunk @noelleleigh @dariux @duklin Now that PDF/UA includes accessibility tags, is that sufficient for your needs?

Don’t hesitate to add a comment here if there’s something else to add to the current PDF/UA export.

If you have found a bug (invalid documents, problems with the structure), please open a separate issue!

liZe avatar Dec 23 '22 14:12 liZe