WeasyPrint
WeasyPrint copied to clipboard
Declarative way to add accessibility tags to PDFs
In many industries, accessibility is a big deal. It's impossible for a PDF to be considered accessible unless it's "tagged":
- WebAIM's guide to accessible PDFs
- Section 508's guide to creating accessible PDFs
- Acrobat's Accessibility Portal
- U of MN's guide to Accessible PDFs
- WebAIM mailing list archives that mention WeasyPrint
If accessibility tagging was added to weasyprint, it would be an ace in the hole for many industries. The current state of the art seems to be to do a whole lot of clicking in acrobat.
If the input HTML had certain special attributes, weasyprint could apply the equivalent accessibility tags.
It would be possible if we had our own PDF generator or post-processor. If someone is interested in replacing Cairo…
@liZe
- how much work would you think it is to replace Cairo ?
- to what you would suggest to replace it to ?
- how much work would you think it is to replace Cairo ?
A lot (see #841).
- to what you would suggest to replace it to ?
A pure-Python library generating PDF. I’m not fond of reportlab, but something like that should be OK.
(I’m currently reading the PDF spec to write such a library, but that’s a secret :wink:.)
This sounds very interesting. I would really appreciate the effort to implement the tagging of PDF documents.
Dear Lize!
We also now have the requirement of a tagged and accessible PDF. After some research I found: https://www.cairographics.org/news/cairo-1.16.0/
The PDF backend has gained support for .. tags. Tags permit adding logical info such as headings, tables, figures, etc. that facilitates indexing, accessibility, text reflow, searching, and extraction of the tagged items to other software. For details on this new PDF functionality, see: https://lists.cairographics.org/archives/cairo/2016-June/027427.html
And directly in the Cairo docs: https://www.cairographics.org/manual/cairo-Tags-and-Links.html#doc-struct
It does seem to me, that Cairo does indeed support the required structural tagging.
As a standard HTML uses all these tags allready, it should be quite straight forward to map those to the appropriate PDF-Tags?
Let me know, if we can assist in any way to help this issue along!
As always, thanks a lot for your great library!
Johannes
It does seem to me, that Cairo does indeed support the required structural tagging.
It does, you’re right!
It means that it could be possible to add tags using Cairo. I don’t think that there’s currently an easy way to do this using only the public API of WeasyPrint, even with the new finisher
option. We’ll drop Cairo soon, but it will be possible to do this with another library too.
If anyone wants to work on this, I can help!
Great news!
- Without any promises, how soon will you be dropping Cairo?
- Assuming that 1) will not be happening in the next 4 weeks, if we supported your patreon campaign, would you be willing to tackle this issue still using Cairo?
- If not, I would welcome some ideas on where and how you would like to see me implement this..
- Without any promises, how soon will you be dropping Cairo?
Next release will come in September (I hope), next one will be without Cairo (may take time for users to test and report broken corner cases).
2. Assuming that 1) will not be happening in the next 4 weeks, if we supported your patreon campaign, would you be willing to tackle this issue still using Cairo?
Big secret: we’re currently building a small structure dedicated to WeasyPrint and its dependencies (and misc free software), you shouldn’t give to the patreon campaign and wait for a the end of the month, we’ll have more time and more resources to implement the features you want :wink:.
3. If not, I would welcome some ideas on where and how you would like to see me implement this..
It will be easier without Cairo, as we’ll have a dedicated library to create PDF files, and won’t have the Cairo surface / PDF bytestring separation as it’s done now.
Sounds great! Looking forward to a bright future, where pdf writing is fully under your control :-)
But I can't put our current project on that timetable. So I just looked into the code.. it was very straight forward to put a few calls to context.tag_begin and tag_end into draw.py with the appropriate mappings from element_tag. Works great! Checked with PAC3 and Adobe Acrobat. Next step will be nested sections and table structures.
Will keep you posted!
So I just looked into the code.. it was very straight forward to put a few calls to context.tag_begin and tag_end into draw.py with the appropriate mappings from element_tag.
Would you be willing to share a snippet that explains how to implement this strategy @JohannesMunk ?
With the Cairo dependency removed, are there any plans to accommodate Tag structures in Weasyprint/pydyf?
any update on this feature? @JohannesMunk can you please share the changes in draw.py that you've made?
any update on this feature?
Cairo has been removed and we now have a custom PDF writer, we technically have everything needed to add this feature. That’s not in the roadmap yet, but we’d be happy to get the different use cases users may have. Don’t hesitate to add comments in this issue if you want.
Are there any news about this issue? We are really interested on this feature. Thanks!
Are there any news about this issue? We are really interested on this feature. Thanks!
We would really like to work on this feature (that would be really nice for accessibility), but that’s quite complex and would require some time to chose the right API, define what’s actually supported and, of course, to implement the feature.
After version 55 is released (soon!), we have a lot of things to do in version 56 with a solid support of Flexbox and Grid, and some sponsored features awaiting. If some companies are interested in sponsoring this feature and get it earlier, don’t hesitate to get in touch!
Version 57 will include PDF/UA support that includes accessibility tags automatically added out of the HTML structure of the document. If anyone is interested in this feature, don’t hesitate to test the current master branch and add a comment here!
Hello @liZe ! Can you clarify how can I test PDF/UA support? Im using PAC free program for testing support PDF/UA.
How I can add this settings?
Hello @bandirom,
Did you set pdf-variant
to pdf/ua-1
to enable PDF/UA generation?
Hello @bandirom, Did you set
pdf-variant
topdf/ua-1
to enable PDF/UA generation?
Probably no :(. Can you help with it. How to set it?
Probably no :(. Can you help with it. How to set it?
If you’re using WeasyPrint by command line, you can set the --pdf-variant
like this weasyprint --pdf-variant=pdf/ua-1 document.html document.pdf
.
If you’re calling write_pdf()
in your code, you can set the variant
parameter to "pdf/ua-1".
@codekiln @malnajdi @malnajdi @JohannesMunk @noelleleigh @dariux @duklin Now that PDF/UA includes accessibility tags, is that sufficient for your needs?
Im testing my pdf file and Adobe show me almost good result. Im checking detected issues
I tested a table. And I dont understand what is the empty boxes. If I manually removed this boxes tests were passed
Can you help?
UPD: even if I make table empty, like <table></table>
I also catch this 2 boxes
Im testing my pdf file and Adobe show me almost good result. Im checking detected issues
Thanks a lot for your report. Could you please open a new separate issue telling that tags are broken with tables (and a link to your comment, there’s no need to copy the text/images)?
@codekiln @malnajdi @malnajdi @JohannesMunk @noelleleigh @dariux @duklin Now that PDF/UA includes accessibility tags, is that sufficient for your needs?
Don’t hesitate to add a comment here if there’s something else to add to the current PDF/UA export.
If you have found a bug (invalid documents, problems with the structure), please open a separate issue!