arlington-pdf-model icon indicating copy to clipboard operation
arlington-pdf-model copied to clipboard

Fields and Widgets, Widgets and Fields.

Open faceless2 opened this issue 1 year ago • 6 comments

It was inevitable this was going to come up at some point.

First, I'm assuming a processing model which means a node in the PDF can be of more than one type. Traverse to a combined field+widget from Fields? It's validated as Field. Traverse from a Page? It's also validated as a Widget. Everything below assumes that model, if that's not how you do it I guess you can ignore the whole thing.


Currently there are 3 types, Field (an untyped field with no FT), FieldNNN (a typed field with FT) and AnnotWidget. And there is a single type for a list of these items, ArrayOfFields which is used for both Fields in the Form and Kids in the Fields. It's a list of: [FieldTx,FieldBtn,FieldCh,FieldSig,Field,AnnotWidget] - I'm ignoring the predicate for FieldSig.

This means that we have the following allowed behaviour:

  1. The form can contain a Fields array that references a widget that has no field (either combined or as a parent)
  2. A widget can belong to a Field with no FT, or belong to no field at all.
  3. The form Fields array can point to elements with a Parent
  4. There is no requirement for consistency between the Parent and Kids arrays
  5. If a Field is combined with a widget, there is no check to ensure it has no Kids
  6. There is no requirement for a Field to have any Widgets.

I think all of those are disallowed (happy to justify if required), so here's a proposal to remedy this.

To fix the first two issues you could split ArrayOfFields into ArrayOfFieldsOrWidgets. Your types then look like

Form
  Fields [ArrayOfFields]

Field
  Parent [Field,FieldTx,FieldCh,FieldBtn,FieldSig]
  Kids [ArrayOfFields]

FieldTx, FieldCh etc
  Parent [Field,FieldTx,FieldCh,FieldBtn,FieldSig]
  Kids [ArrayOfFieldsOrWidgets]

AnnotWidget
  Parent [FieldTx,FieldCh,FieldBtn,FieldSig]
  Kids [none - it's currently defined as ArrayofFields, but should be removed]

ArrayOfFields
  * [Field,FieldTx,FieldCh,FieldBtn,FieldSig]

ArrayOfFieldsOrWidgets
  * [FieldTx,FieldCh,FieldBtn,FieldSig,AnnotWidget]

The last issues can be done with some magic in your SpecialCase field - we need to check

  • if we have a Parent, we're in the Parent's Kids
  • if we don't have a Parent, we're in the Fields array in the Form
  • if we are a terminal field and are not combined with a widget, we have one or more widgets
  • if we are a terminal field and are combined with a widget, we have no Kids

because the rules for Fields are:

Parent - (Required if this field is the child of another in the field hierarchy; absent otherwise) The field that is the immediate parent of this one (the field, if any, whose Kids array includes this field). A field can have at most one parent; that is, it can be included in the Kids array of at most one other field.

Kids - In a non-terminal field, the Kids array shall refer to field dictionaries that are immediate descendants of this field. In a terminal field, the Kids array ordinarily shall refer to one or more separate widget annotations that are associated with this field. However, if there is only one associated widget annotation, and its contents have been merged into the field dictionary, Kids shall be omitted.

and for Widgets:

Parent - (Required if this widget annotation is one of multiple children in a field; optional otherwise) An indirect reference to the widget annotation’s parent field. A widget annotation may have at most one parent; that is, it can be included in the Kids array of at most one field

I think we can represent all that with anfn:Eval that looks like this (expanded to make it a bit more legible):

(
 ((@Parent==null) && (fn:InArray(trailer::Root::AcroForm::Fields))) ||
 ((@Parent!=null) && (fn:InArray(parent::Kids)))
) && (
 ((@Subtype==Widget) && (Kids==null)) ||
 ((@Subtype==null) && (fn:ArraySize(Kids)>0))
)

It's using /Subtype/Widget as the test for "is a widget", which is not quite right, and I've also just invented fn:InArray, and presumed that ==null is the same as "field is not there" - which probably isn't the case. However I think the logic is correct.

Finally, as an alternative if you don't want to go crazy with the special case field, I think we could capture the same logic by splitting FieldTx into lots of subtypes eg FieldTxNonTerminal, FieldTxTerminal, FieldTxTerminalCombined etc, with the same for the other field types. It's a more declarative but explodes the number of types.

Sorry, that's a rough one to start the day with.

faceless2 avatar Sep 28 '22 16:09 faceless2