asdf icon indicating copy to clipboard operation
asdf copied to clipboard

Changing a tag schema resource after Validators creation

Open marscher opened this issue 4 years ago • 6 comments

Upon the first call of AsdfFile.validate(), one cannot change the schema used to validate a specific tag. Even though the provided config context defining the resource mapping during the validate calls is local, this does not work. I assume this is due the fact, that you introduced caching during the creation of the validators (in asdf-2.8). So once the validators have been created, changes to the resource config cannot be picked up.

The only workaround would be to manually flush the cache each time the resource mapping changes. But I guess this will introduce a performance penalty.

marscher avatar Jul 14 '21 09:07 marscher

This isn't a use case that we considered, can you tell us more about what you're doing? To my mind, a tag is a guarantee of a particular structure, so it should always validate against the same schema.

eslavich avatar Jul 14 '21 13:07 eslavich

We wanna allow our users to change/extend our provided schemata, e.g. adding required attributes etc.

marscher avatar Jul 15 '21 06:07 marscher

If new attributes of a Python object are added to the schema and the intention is to be able to write these out to the ASDF file, then the extension/converter to serialize/deserialize has to change too, right? And these should be versioned and in lock-step if using the same tag. So that all previously-written data can be read back in using the correct tag.

If you want users to be able to extend/change the schema/converter you are providing, then they should probably use a separate tag, provide their own code/schemas via an extension, and an entry point to install it.

jdavies-st avatar Jul 15 '21 16:07 jdavies-st

The idea behind this and what we are trying to use it for is described here: https://weldx.readthedocs.io/en/v0.4.1/tutorials/quality_standards.html

In short, we want to provide a method for users to quickly validate existing files with existing tags/objects against modified tags/schemas. Those modified tags should be more restrictive than the 'base' tag/schema definition (for example, require an additional metadata field that is not present in the original tag schema) You could think of these update tags like an updated fork of the original extension module.

Ideally we would like to archive this without the work to create a new full python extension. Our approach so far is to override existing tag mappings with new resources while keeping the underlying python implementation in place. Which of these "override" extensions are used is logged in the file upon creation.

CagtayFabry avatar Jul 22 '21 15:07 CagtayFabry

But ASDF files need to be able to round trip. So if users create new files with their more restrictive schemas, to for example require an additional metadata field and write that field out, then there's no way to verify that the object they write out can be read in using the existing, public tag. It doesn't know about that metadata field, and won't know what to do with it. And that breaks round tripping. The data format is no longer portable, right?

jdavies-st avatar Jul 22 '21 17:07 jdavies-st

True, that is why we will keep track of the involved packages in the same manner as existing asdf extensions.

The main focus for now also lies on reading existing files with 'modified' schemas for validation. As an example one could think of different simulation or processing pipelines that are to run on existing asdf files but need to ensure some special data formats in some of the nested tags. We would like to be able to validate these requirements on read purely based on asdf validation. Just as a preliminary step before working with the files.

Regarding the problem of 'unknown' fields that would not get handled by the base tag implementation: Since that is indeed an issues we tried to avoid that by adding 'reserved' fields that will get picked up by any of our custom objects for these scenarios: https://weldx.readthedocs.io/en/v0.4.1/tutorials/custom_metadata.html Of course the user will have to stick to this convention, otherwise data might simply get lost.

CagtayFabry avatar Jul 22 '21 21:07 CagtayFabry