tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Method to clear metadata

Open hyanwong opened this issue 3 years ago • 7 comments

When messing with metadata in tables where the schema allows null entries, it is helpful to be able to clear the table. I'm currently doing:

tables.nodes.packset_metadata([b''] * tables.nodes.num_rows)

But this seems a bit hack and unintuitive. I wonder if it would be worth having a wrapper function to do this?

hyanwong avatar Mar 07 '22 18:03 hyanwong

I'm not sure if this is a common enough editing-rows operation to special-case it?

petrelharp avatar Mar 07 '22 19:03 petrelharp

Yes, I'm not sure. But metadata is special in a way, as it can be cleared without affecting the integrity of the tree sequence.

hyanwong avatar Mar 08 '22 08:03 hyanwong

Any ragged column could be cleared with affecting the referential integrity. The questions is whether we can be bothered adding methods for all of them (or if they would be any use).

We could easily add a method clear_metadata here that would be inherited by all tables that have metadata, so I think that's an easy addition and not too much complexity.

We do need to thing about whether this operation should examine the schema though, and see if the result is compatible with the schema. I think probably not?

jeromekelleher avatar Mar 08 '22 10:03 jeromekelleher

Any ragged column could be cleared with affecting the referential integrity.

True. I think I meant that it doesn't change the "look" of the tree sequence to tskit, which is not true if e.g. clearing the ancestral_state columns.

We could easily add a method clear_metadata here that would be inherited by all tables that have metadata, so I think that's an easy addition and not too much complexity.

Neat. I think this is reasonably useful.

We do need to thing about whether this operation should examine the schema though, and see if the result is compatible with the schema. I think probably not?

Probably not, although perhaps the only time it would fail(I think) is if it is a struct without "null" in the top level type union? So that might be an easy check?

hyanwong avatar Mar 08 '22 10:03 hyanwong

No, there's any number of different ways it could fail I'm afraid, and without higher-level metadata APIs we're wasting our time trying to enumerate them.

jeromekelleher avatar Mar 08 '22 10:03 jeromekelleher

No, there's any number of different ways it could fail I'm afraid, and without higher-level metadata APIs we're wasting our time trying to enumerate them.

Fine - happy to avoid this check then.

hyanwong avatar Mar 08 '22 10:03 hyanwong

Happy with this idea - although I think it is trivial to check against the schema and worth doing. I'd add a force argument to override the check.

benjeffery avatar Mar 08 '22 10:03 benjeffery