arlington-pdf-model
arlington-pdf-model copied to clipboard
Fields and Widgets, Widgets and Fields.
It was inevitable this was going to come up at some point.
First, I'm assuming a processing model which means a node in the PDF can be of more than one type. Traverse to a combined field+widget from Fields
? It's validated as Field. Traverse from a Page
? It's also validated as a Widget. Everything below assumes that model, if that's not how you do it I guess you can ignore the whole thing.
Currently there are 3 types, Field
(an untyped field with no FT
), FieldNNN
(a typed field with FT
) and AnnotWidget
. And there is a single type for a list of these items, ArrayOfFields
which is used for both Fields
in the Form and Kids
in the Fields. It's a list of: [FieldTx,FieldBtn,FieldCh,FieldSig,Field,AnnotWidget]
- I'm ignoring the predicate for FieldSig.
This means that we have the following allowed behaviour:
- The form can contain a
Fields
array that references a widget that has no field (either combined or as a parent) - A widget can belong to a
Field
with noFT
, or belong to no field at all. - The form
Fields
array can point to elements with aParent
- There is no requirement for consistency between the
Parent
andKids
arrays - If a Field is combined with a widget, there is no check to ensure it has no
Kids
- There is no requirement for a Field to have any Widgets.
I think all of those are disallowed (happy to justify if required), so here's a proposal to remedy this.
To fix the first two issues you could split ArrayOfFields
into ArrayOfFieldsOrWidgets
. Your types then look like
Form
Fields [ArrayOfFields]
Field
Parent [Field,FieldTx,FieldCh,FieldBtn,FieldSig]
Kids [ArrayOfFields]
FieldTx, FieldCh etc
Parent [Field,FieldTx,FieldCh,FieldBtn,FieldSig]
Kids [ArrayOfFieldsOrWidgets]
AnnotWidget
Parent [FieldTx,FieldCh,FieldBtn,FieldSig]
Kids [none - it's currently defined as ArrayofFields, but should be removed]
ArrayOfFields
* [Field,FieldTx,FieldCh,FieldBtn,FieldSig]
ArrayOfFieldsOrWidgets
* [FieldTx,FieldCh,FieldBtn,FieldSig,AnnotWidget]
The last issues can be done with some magic in your SpecialCase field - we need to check
- if we have a Parent, we're in the Parent's Kids
- if we don't have a Parent, we're in the Fields array in the Form
- if we are a terminal field and are not combined with a widget, we have one or more widgets
- if we are a terminal field and are combined with a widget, we have no
Kids
because the rules for Fields are:
Parent - (Required if this field is the child of another in the field hierarchy; absent otherwise) The field that is the immediate parent of this one (the field, if any, whose Kids array includes this field). A field can have at most one parent; that is, it can be included in the Kids array of at most one other field.
Kids - In a non-terminal field, the Kids array shall refer to field dictionaries that are immediate descendants of this field. In a terminal field, the Kids array ordinarily shall refer to one or more separate widget annotations that are associated with this field. However, if there is only one associated widget annotation, and its contents have been merged into the field dictionary, Kids shall be omitted.
and for Widgets:
Parent - (Required if this widget annotation is one of multiple children in a field; optional otherwise) An indirect reference to the widget annotation’s parent field. A widget annotation may have at most one parent; that is, it can be included in the Kids array of at most one field
I think we can represent all that with anfn:Eval
that looks like this (expanded to make it a bit more legible):
(
((@Parent==null) && (fn:InArray(trailer::Root::AcroForm::Fields))) ||
((@Parent!=null) && (fn:InArray(parent::Kids)))
) && (
((@Subtype==Widget) && (Kids==null)) ||
((@Subtype==null) && (fn:ArraySize(Kids)>0))
)
It's using /Subtype/Widget
as the test for "is a widget", which is not quite right, and I've also just invented fn:InArray
, and presumed that ==null
is the same as "field is not there" - which probably isn't the case. However I think the logic is correct.
Finally, as an alternative if you don't want to go crazy with the special case field, I think we could capture the same logic by splitting FieldTx
into lots of subtypes eg FieldTxNonTerminal
, FieldTxTerminal
, FieldTxTerminalCombined
etc, with the same for the other field types. It's a more declarative but explodes the number of types.
Sorry, that's a rough one to start the day with.