tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Assign a table to a TableCollection

Open benjeffery opened this issue 3 years ago • 12 comments

Currently this results in an error:

>>> tc = tskit.TableCollection(1)
>>> tc.individuals=tskit.IndividualTable()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: can't set attribute

Is there any reason that it shouldn't be possible? The implementation of this would seem to be clearing the existing table and copying the table data across.

Came up in the context of https://github.com/tskit-dev/tskit/pull/1487#discussion_r648387416

benjeffery avatar Jun 09 '21 15:06 benjeffery

Makes sense to me!

petrelharp avatar Jun 09 '21 15:06 petrelharp

However, since all these tables refer to each other, it doesn't seem that common that we'd actually have an IndividualTable that makes sense by itself - although I see the counter-example in #1487.

Or, I could imagine a nice way to modify, say, a mutation table would be

muts = tskit.MutationTable()
for mut in tc.mutations:
  muts.add_row( <some modification of mut> )
tc.mutations = muts

... although, now we could do

old_muts = tc.mutations
tc.mutations.clear()
for mut in old_muts:
  tc.mutations.add_row( ... )

petrelharp avatar Jun 09 '21 15:06 petrelharp

I agree, we have a lot of code that does the copy-clear-mutate-add/append dance. I prefer making a new table then assigning it as in your first example.

benjeffery avatar Jun 09 '21 16:06 benjeffery

Sure, it makes sense. We would need to be clear that the semantics is this though:

old_table = tc.individuals
tc.individuals = new_table
assert tc.individuals == new_table
assert tc.individuals is not new_table
assert tc.individuals is old_table

The fact that we're not actually assigning the object itself is what put me off this in the past, as that seems a bit counter intuitive and contrary to what you might think of the semantics of assignment. The object ownership stuff holding the tables in a TableCollection is very tricky, and I definitely don't want to go messing with that.

jeromekelleher avatar Jun 10 '21 07:06 jeromekelleher

I guess another way of doing this would be to have a method replace or something, which does the same thing, which doesn't require us getting into these murky waters:

tc.individuals.replace(new_table)
assert tc.individuals == new_table
assert tc.individuals is not new_table

I'm not sure about the name, but "Replace the contents of this table with the contents of the specified table" seems like a clear definition?

jeromekelleher avatar Jun 10 '21 07:06 jeromekelleher

I agree that it would be better to have this be a named method, rather than doing sneaky modification on assingment. We should add a helpful message to the error on assignment that signposts the named function. How about replace_with?

benjeffery avatar Jun 10 '21 09:06 benjeffery

Yes, that sounds good.

jeromekelleher avatar Jun 10 '21 11:06 jeromekelleher

If we don't just get to assign with = then I'm not sure there'd be a big advantage over the current way:

old_muts = tc.mutations.copy()
tc.mutations.clear()
for mut in old_muts:
  tc.mutations.add_row( ... )

petrelharp avatar Jun 10 '21 12:06 petrelharp

No harm in having two ways of doing it, though. It would make it more natural to pass around tables on their own.

jeromekelleher avatar Jun 10 '21 13:06 jeromekelleher

Presumably the more efficient way is to set the columns rather than add each row (as this doesn't check the metadata):

import msprime
ts = msprime.simulate(10)
tables = ts.dump_tables()
node_map = np.arange(ts.num_nodes)[::-1]
new_table = ts.tables.nodes[node_map]  # new table with a reversed node order
tables.nodes.set_columns(
    flags=new_table.flags,
    time=new_table.time,
    population=new_table.population,
    individual=new_table.individual,
    metadata=new_table.metadata,
    metadata_offset=new_table.metadata_offset,
)

I think this should be fairly efficient?

hyanwong avatar Jul 07 '22 13:07 hyanwong

Actually, should it be as simple as this in the BaseTable class:

    def replace_with(self, other):
        """
        Entirely replace this table with another
        """
        if type(self) != type(other):
            raise ValueError("replacement column is not of the same type")
        tables.nodes.set_columns(
            metadata_schema = repr(other.metadata_schema),  # encode the schema
            **{column: getattr(other, column) for column in self.column_names}
        )

hyanwong avatar Jul 07 '22 13:07 hyanwong

I'm reopening this as we haven't done assignment, but also haven't decided to not do it!

benjeffery avatar Jul 13 '22 11:07 benjeffery