pdf-writer
pdf-writer copied to clipboard
Feature: Interactive Form Dictionary (PDF 1.7 Section 12.7.2) & Appearance Streams (12.5.5)
This PR adds the writers and types for writing the document catalog /AcroForms dictionary, as well as appearance streams. This PR also cleans up and unifies the API and documentation of #20, #21, #22 and #23.
It finalizes the support for interactive forms (save for signature fields), allowing work on typst to begin.
I left out signature fields because the implementation and review should happen in isolation. I want to make sure the implementation of those is sound, as it comes with a higher risk.
I should also note that this has not been resolved and should be before this is merged: https://github.com/typst/pdf-writer/blob/28b79dda4d85b06a3900fde88df681db907fc0b9/src/forms.rs#L292-L300
@reknih I'll leave this in a draft to avoid that this is merged before all important points have been resolved. But I'd say it's ready for review. I want to actually try this out before it merged too, I'll mark it as ready when I added an example.
Blocked by #26.
Currently, Annotation::appearance_state takes a Name, while Field::radio_value takes a RadioState which resolves to either to a custom /Name which is unique (or should be) across its child widget annotations, or /Off. I'm thinking that using RadioState for the annotation feels a bit like leaking the radio button impl there.
I found a few documentation errors that I fixed locally but haven't pushed yet.
Overall, I'm stuck on testing behavior, which just like the appearance is wildly different for each PDF viewer.
It appears that the optional merging of widget annotations and terminal fields (Section 12.5.6.19) is, in fact, not optional, but required by most viewers to make radio buttons work. This means that fields need an additional function that turns them into Annotations.
So while the PR is technically spec compliant so far, almost no reader will accept the fields if you don't merge them with their annots. Thankfully most radio specific attributes are easy to write to an annotation using Deref<Target = Dict>, but it's not pretty to have to fall back to this to get a working PDF.
I have rebased to incorporate #26. The following decisions must be made before merge:
- Should
/NeedAppearancesbe included (see my review comment) - What about this, I couldn't find what they mean by a character here, but it's probably not the rust equivalent pdf-writer/src/forms.rs#L292-L300
- Are we so far ok with the API, i.e. using prefixes for specializations of methods, etc.
With the changes in this PR (and #26) pdf-writer is able to create PDF form fields which are accepted by the most common readers:
Full support:
- PDF Acrobat
Almost Full support:
- chromium
- firefox
Partial support:
- evince
- okular
The lack of support seems to be a reader problem as, for some form field configurations neither forms created in accordance to the spec, not those created by Acrobat Pro behaved correctly, but had somewhat correct appearance.
I can add an example if that is desired.
- Did you need
NeedAppearancesfor any of your form test implementations? - I think they mean the number of bytes written, because section 7.2.2 states that "[regular characters] include bytes that are outside the ASCII character set", implying bytes are characters. This is also the reasonable interpretation that will fit the least characters. Have you investigated how readers interpret this, i.e. with four-byte (in UTF-8) CJK or emojis?
- I'm happy with the API!
An example would be good, I think.
- Did you need NeedAppearances for any of your form test implementations?
The PDF generated by Adobe Acrobat Pro did not use it anywhere as it came with default appearances. And I would generally also provide default appearances in the typst impl later on anyway.
- Have you investigated how readers interpret this, i.e. with four-byte (in UTF-8) CJK or emojis?
It seems to depend on what the implementation sees as a character, for chromium and firefox with /MaxLen 1 it seems to work like this:
| descr | input | bytes | allowed |
|---|---|---|---|
| 2 byte latin sequence | รก |
c3 a1 |
yes |
| 3 byte cjk sequence | ๅ |
e5 8e 9f |
yes |
| 4 byte single emoji sequence | ๐ณ |
f0 9f 8f b3 |
no |
| 12 byte composite emoji sequence | ๐ณ๏ธโ๐ |
f0 9f 8f b3, ef b8 8f e2, 80 8d f0 9f, 8c 88 |
no |
So it's neither a byte nor a codepoint nor a grapheme?
- I'm happy with the API!
I'll add some additional methods I noticed would be handy when preparing the example.
EDIT: see https://github.com/typst/pdf-writer/issues/27#issuecomment-1808611819 regarding the API, specifically subtypes vs method prefixes.
Is it perhaps UTF-16 len or something like that? I believe PdfDocEncoding is UTF-16.
Is it perhaps UTF-16 len or something like that? I believe PdfDocEncoding is UTF-16.
~~Wouldn't that disallow a 3-byte sequence?~~
It seems the encoded string was UTF-16BE, one of the valid encodings for text strings, in which ๅ is just a single code point (539f), whereas ๐ฌ is two (d822, fd00), which does not work despite being a single grapheme so that clearly answers my question.
- Did you need NeedAppearances for any of your form test implementations?
The PDF generated by Adobe Acrobat Pro did not use it anywhere as it came with default appearances. And I would generally also provide default appearances in the typst impl later on anyway.
Okay, so we don't plan to use it, neither does Acrobat and its getting deprecated. I say drop it.
After some painful debugging, I got radio buttons to also work, I've added Annotation::page while testing, it's required for some annotation actions, none that we need, but we may as well add it.
Something that doesn't work however, but is an upstream issue, is that for checkboxes pdf.js will respect the RADIOS_IN_UNISON flag, but not for radio buttons. pdfium handles this correctly, and so does Acrobat.
Dropdowns also work as expected. The following work remains:
- [ ] Test more complex appearance streams
- [ ] Test printing
- [ ] Signature fields (likely a different PR as stated before)
You can view the new example to generate a PDF with some form fields. I'll change to use the writer.finish() version once I've tested everything to completion, this was easier for me to write. Feel free to test this in various readers, the feature set of forms is so large that I could've missed edge cases.
I have a hard time to find a consistent way to get appearances to show up in most viewers, it seems they almost all support XFA forms very well, but once normal forms are used they don't properly render the appearances if at all.
Okular refuses to not show fields with their names and a simple button, evince simply shows nothing making them impossible to find on a blank page, chrome renders at least some of them correctly, firefox kind of also provides its own defaults like okular does, but they fit better with the rest.
Sadly, it seems I haven't made much progress on this, and I can't test printing if I don't see anything in the PDF first.
Hey I found some time to test this on Mac:
Review
Mac Preview/safari/Skim
- radio buttons are buggy: the first radio also toggles the third one, which is usually caused by the first and third radio having the same label (I noticed that working with LaTeX)
- buttons don't have any background set - so finding them is really tricky
firefox
- radio buttons work independently, but look like odd googly eyes
- the reset button appears slightly yellow on hover but is still hard to discover
chrome/microsoft edge (on mac)
- radio buttons appear as rectangles and have checkmarks instead of radio button look, also first and third button go on and off together
- reset button has a border now
Thoughts
I think that most of the difficulty with the appearance is the fallback behaviour of each pdf reader - so I think the best way would be to find a good set of default values to set like border, background color etc. to avoid having to deal with fallbacks and defaults.
Instead of digging into the PDF standard I started looking at LaTeX - bc. I know the docs are usually okay and I found this PDF documenting the hyperref package - the relevant bits are in section 9 PDF and HTML. in combination with tex.stackoverflow I was able to implement some forms in the past.
If it helps I can create some sample pdfs that have a certain behaviour/look to have a target implementation to aim for.
radio buttons are buggy: the first radio also toggles the third one, which is usually caused by the first and third radio having the same label (I noticed that working with LaTeX)
This is intentional, I've used the RADIOS_IN_UNISON flag, meaning that those with the same appearance stream names (1 and 3) refer to the same value, the behavior of Firefox is actually buggy here.
I think that most of the difficulty with the appearance is the fallback behaviour of each pdf reader - so I think the best way would be to find a good set of default values to set like border, background color etc. to avoid having to deal with fallbacks and defaults.
Yeah, that's what I've tried around with, but I couldn't get it to work, I assume I need to read up more on general annotation appearance.
If it helps I can create some sample pdfs that have a certain behaviour/look to have a target implementation to aim for.
I wouldn't mind, every PDF helps to get a clearer picture!
Thank you for testing it!
just tell me what pdfs you need, I would suggest one for buttons with several different options, one with dropdowns, etc.
I'll share the sources and the resulting pdfs somewhere publicly so you can play around with them. and link them to the pr
let me know if you want something specific to test.
On Sun, 14 Jan 2024, 11:57 tingerrr, @.***> wrote:
radio buttons are buggy: the first radio also toggles the third one, which is usually caused by the first and third radio having the same label (I noticed that working with LaTeX)
This is intentional, I've used the RADIOS_IN_UNISON flag, meaning that those with the same appearance stream names (1 and 3) refer to the same value, the behavior of Firefox is actually buggy here.
I think that most of the difficulty with the appearance is the fallback behaviour of each pdf reader - so I think the best way would be to find a good set of default values to set like border, background color etc. to avoid having to deal with fallbacks and defaults.
Yeah, that's what I've tried around with, but I couldn't get it to work, I assume I need to read up more on general annotation appearance.
If it helps I can create some sample pdfs that have a certain behaviour/look to have a target implementation to aim for.
I wouldn't mind, every PDF helps to get a clearer picture!
Thank you for testing it!
โ Reply to this email directly, view it on GitHub https://github.com/typst/pdf-writer/pull/25#issuecomment-1890930865, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFAJ3JKWGEMGIHROA6P26LYOPB3DAVCNFSM6AAAAAA6RDUTB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJQHEZTAOBWGU . You are receiving this because you commented.Message ID: @.***>
I'll likely come back to this when I have more time to work on forms. Currently, typst-test and university are taking up most of my time.
The last commit adds some examples for specifying appearances, unfortunately there are far too many ways to do this to meaningfully cover them in the forms example.
While the checkboxes may not look very different, this is actually now an explicit implementation replicating the chrome default behavior (although with slightly different spacing).
Unfortunately, evince still refuses to render my checkboxes correctly, whereas the example file I looked at works well on both chrome and evince. This is basically my last pain point, I'm not sure what's missing for evince to render this correctly.
EDIT: This now also works on evince, the lack of required bbox for the FormXObject threw it off. Okular behaves the same way as my reference file, which is unfortunately not very well, but it is according to reference, so I'm tempted to blame okular here.
I think the example needs some polishing, what vexes me is how much redundant information has to be inserted for most viewers to accept this.
It seems, though, that the implementation itself is sufficiently complete to be able to start work on the typst side for the common field types.
I'll mark this ready for review.