tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Setting and querying metadata schemas to "null"

Open jeromekelleher opened this issue 2 years ago • 4 comments

Currently, if you want to query whether a schema is set you need to write:

if table.metadata_schema == tskit.MetadataSchema(schema=None):
    # do something

and to set to a null schema you need to write

table.metadata_schema = tskit.MetadataSchema(schema=None)

This is not at all obvious I think.

Could we change this to

if table.metadata_schema is None:
    # do stuff

and

table.metadata_schema = None

or would this break too much code?

I guess this is all related to what we want the high-level schema API to be (do we have an issue to track this?). I think it makes sense to say that the MetadataSchema object doesn't exist, rather than to say that the MetadataSchema is null, or something (i.e., via an is_null() method)

jeromekelleher avatar Jan 05 '22 16:01 jeromekelleher

We are doing something similar with the TreeSequence.reference_sequence, where it returns None if it's not defined, FWIW

jeromekelleher avatar Jan 05 '22 16:01 jeromekelleher

I've had a look through the metadata code, and I think the best way to support the comparison to None is to do so at the __eq__ method of the MetadataSchema class and to allow assignments to None on the property setter of the metadata holding objects, detecting the None and storing an actual schema instance.

The alternative of actually holding None means adding a load of ifs wherever we use a schema.

No code would be broken in this way, as we're only expanding the allowed arguments to the property setter to include None.

benjeffery avatar Jan 06 '22 11:01 benjeffery

That sounds like it could be quite messy - we wouldn't want to have something like

table.metadata_schema == None # True
table.metadata_schema is None # False

Unless the value returned by table.metadata_schema can really be None then we shouldn't confuse things by setting it to that value.

Hmm. So, it seems like table.metadata_schema.is_null() is the best option then?

jeromekelleher avatar Jan 06 '22 16:01 jeromekelleher

we wouldn't want to have something like

Hmm, yeah good point. I guess is_null works. I think I originally didn't think of it being possible to have a None schema, only that you can have None as the specification of a schema. This is because even tskit.MetadataSchema(schema=None) is effectively a meaningful schema that says "the object must be a byte array".

benjeffery avatar Jan 07 '22 12:01 benjeffery