asdf icon indicating copy to clipboard operation
asdf copied to clipboard

Add option to disable inline representation of tagged field content when writing to ASDF file

Open ketozhang opened this issue 2 years ago • 9 comments

I do not see this in the docs, but currently when writing to ASDF file, all ASDF tags are written inline if it is not nested (i.e., all children are leaf nodes).

Current Behavior:

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 2.14.3}

Desired Behavior:

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 
  author: The ASDF Developers
  homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, 
  version: 2.14.3

Can we add an option to disable this and I much prefer this to be the default.

ketozhang avatar Mar 24 '23 00:03 ketozhang

Here's an example what happens if the tagged field has a child field

node: !<asdf://example.com/tags/node-1.0.0>
  child: !<asdf://example.com/tags/node-1.0.0> {child: null}

This looks a bit silly

ketozhang avatar Mar 24 '23 00:03 ketozhang

This should be controlled by the flowStyle setting in the schema describing the tag. See https://asdf-standard.readthedocs.io/en/1.5.0/schemas/yaml_schema.html#Schema%20Definitions for details. Not that in your case I believe you need to set flowStyle: block in the schema.

It should work as the Roman asdf files make extensive use of this. For example if one generates and saves a wfi_image, its schema contains flowStyle: block here. A sample header is:

roman: !<asdf://stsci.edu/datamodels/roman/tags/wfi_image-1.0.0>
  meta:
    aperture: !<asdf://stsci.edu/datamodels/roman/tags/aperture-1.0.0>
      name: WFI_06_FULL
      position_angle: 30.149604523080143
    cal_step: !<asdf://stsci.edu/datamodels/roman/tags/cal_step-1.0.0>
      assign_wcs: INCOMPLETE
      flat_field: SKIPPED
      dark: COMPLETE
      dq_init: COMPLETE
      jump: COMPLETE
      linearity: SKIPPED
      photom: COMPLETE
      ramp_fit: COMPLETE
      saturation: INCOMPLETE
    calibration_software_version: !<asdf://stsci.edu/datamodels/roman/tags/calibration_software_version-1.0.0> Version
      94bb92d0a922867c94ed1fddcb4216a5
    coordinates: !<asdf://stsci.edu/datamodels/roman/tags/coordinates-1.0.0>
      reference_frame: ICRS
    crds_context_used: roman_8226.pmap
    crds_software_version: 63.1.67
    ephemeris: !<asdf://stsci.edu/datamodels/roman/tags/ephemeris-1.0.0>
      earth_angle: 1.301915411283766
      moon_angle: 1.1516403400836825
      sun_angle: 2.736255246678112
      type: PREDICTED
      time: 59865.65463414276
      ephemeris_reference_frame: Frame 3706a1f3
      spatial_x: -1.3623950365305825e+308
      spatial_y: -1.60804538172114e+306
      spatial_z: 1.5921914973582028e+307
      velocity_x: 1.3125923623238684e+308
      velocity_y: -1.6632717836035864e+308
      velocity_z: 2.352229275461757e+307

I believe yaml would have normally chosen to represent the part under aperture in the way you describe if the schemas involved did not specify the flowStyle: block.

WilliamJamieson avatar Mar 24 '23 02:03 WilliamJamieson

How would I control this without defining a schema for the tag?

One may want to overwrite all tag's preference after all this is just for the serialization and does not affect deseralization.

ketozhang avatar Mar 24 '23 02:03 ketozhang

This is currently not supported outside of setting the style within the tag definition. However, looking at the early history, adjusting the default for serialization was planned but never finished, see:

https://github.com/asdf-format/asdf/blob/d33ca09e62769a7e92e43af7ca3ae61eeb2e14ef/asdf/yamlutil.py#L90-L95

Please feel free to explore how to do this further if you wish and open PR adding the feature if you figure out how to do it.

Now I do have some broad thoughts on how to accomplish this. Note that the flowStyle key is read into this class attribute:

  • https://github.com/asdf-format/asdf/blob/d33ca09e62769a7e92e43af7ca3ae61eeb2e14ef/asdf/tagged.py#L59
  • https://github.com/asdf-format/asdf/blob/d33ca09e62769a7e92e43af7ca3ae61eeb2e14ef/asdf/tagged.py#L87

For the two wrapper objects that are used to wrap tagged objects in the ASDF tree. This is then read by the yaml dumper routines:

  • https://github.com/asdf-format/asdf/blob/d33ca09e62769a7e92e43af7ca3ae61eeb2e14ef/asdf/yamlutil.py#L57
  • https://github.com/asdf-format/asdf/blob/d33ca09e62769a7e92e43af7ca3ae61eeb2e14ef/asdf/yamlutil.py#L51

for the two wrapper types above. Thus if you examine the raw ASDF tree, once its been converted into data by the converters, you should be able to set the flow_style keyword on the tagged objects you wish to effect and then when that tree is passed through to pyyaml for dumping using the AsdfDumper everything should get correctly read and adjusted accordingly.

How and when to access will require more in-depth investigation and experimentation.

WilliamJamieson avatar Mar 24 '23 13:03 WilliamJamieson

Very informative, thank you! I will take a stab it this.

First, ASDF is using whatever default the pyyaml dumper is using. This is fine as-is, but if ASDF has a preference, the change seems simple and this would not ignore the tag definition schema preference.

- flow_style = _flow_style_map.get(mapping.flow_style, None)
+ flow_style = _flow_style_map.get(mapping.flow_style, _flow_style_map['block'])

For adding user overrides of flow style, I'm seeing this would have to come from the configs elsewhere as the function signature for the representers will not support this (unless I'm mistaken and the object type for data/mapper/sequence do contain user configs).

Lastly, what are the interfaces the users can use to set stylistics preferences. Is it just AsdfFile.write_to() and/or config (e.g., asdf.get_config())?

ketozhang avatar Mar 24 '23 18:03 ketozhang

This block style is not necessarily ideal, so I disagree with forcing the default to be "block".

Since ASDF uses pyyaml to write the yaml output to the header, it was decided that we would just let pyyaml decide how best to format the yaml rather than making a single decision. pyyaml uses a heuristic to decide when to switch from "flow" style to "block" style, and we felt like this seemed appropriate way to standardize the decision. Moreover, if users of ASDF disagree with this approach they can simply configure their schemas appropriately.

In so far as, enabling user overrides placing the interface in the asdf-config is a fine idea, this can even enable the AsdfFile.write_to() to set the overrides by simply letting it use the config context to temporally modify the configuration for a given write.

WilliamJamieson avatar Mar 24 '23 18:03 WilliamJamieson

Note that "inlining" arrays (storing them in the yaml directly, instead of in a binary block) operates on the same principle as the flow_style, except it has a global configuration option which passes through to a write option.

The global setting for this is controlled by the all_array_storage setting in the AsdfConfig: https://github.com/asdf-format/asdf/blob/8dce02c664a9acbb7fd3e5335f5f373681ef10fc/asdf/config.py#L325-L347

I believe something like global_flow_style could be done in a similar way.

WilliamJamieson avatar Mar 24 '23 18:03 WilliamJamieson

Here is where all_array_storage passes into the writing of the file: https://github.com/asdf-format/asdf/blob/8dce02c664a9acbb7fd3e5335f5f373681ef10fc/asdf/asdf.py#L1188-L1190

WilliamJamieson avatar Mar 24 '23 18:03 WilliamJamieson

This block style is not necessarily ideal, so I disagree with forcing the default to be "block". pyyaml uses a heuristic

Okay I won't push further, but I will say it's not much of a heuristic -- it looks to be just a consistency rule:

By default, PyYAML chooses the style of a collection depending on whether it has nested collections. If a collection has nested collections, it will be assigned the block style. Otherwise it will have the flow style. -- PyYAML


Thank you, the existing config for array serialization styling helps a lot.

ketozhang avatar Mar 24 '23 18:03 ketozhang