tskit
tskit copied to clipboard
Setting and querying metadata schemas to "null"
Currently, if you want to query whether a schema is set you need to write:
if table.metadata_schema == tskit.MetadataSchema(schema=None):
# do something
and to set to a null schema you need to write
table.metadata_schema = tskit.MetadataSchema(schema=None)
This is not at all obvious I think.
Could we change this to
if table.metadata_schema is None:
# do stuff
and
table.metadata_schema = None
or would this break too much code?
I guess this is all related to what we want the high-level schema API to be (do we have an issue to track this?). I think it makes sense to say that the MetadataSchema object doesn't exist, rather than to say that the MetadataSchema is null, or something (i.e., via an is_null()
method)
We are doing something similar with the TreeSequence.reference_sequence
, where it returns None if it's not defined, FWIW
I've had a look through the metadata code, and I think the best way to support the comparison to None
is to do so at the __eq__
method of the MetadataSchema
class and to allow assignments to None on the property setter of the metadata holding objects, detecting the None and storing an actual schema instance.
The alternative of actually holding None
means adding a load of if
s wherever we use a schema.
No code would be broken in this way, as we're only expanding the allowed arguments to the property setter to include None
.
That sounds like it could be quite messy - we wouldn't want to have something like
table.metadata_schema == None # True
table.metadata_schema is None # False
Unless the value returned by table.metadata_schema
can really be None
then we shouldn't confuse things by setting it to that value.
Hmm. So, it seems like table.metadata_schema.is_null()
is the best option then?
we wouldn't want to have something like
Hmm, yeah good point. I guess is_null
works. I think I originally didn't think of it being possible to have a None
schema, only that you can have None
as the specification of a schema. This is because even tskit.MetadataSchema(schema=None)
is effectively a meaningful schema that says "the object must be a byte array".