tskit
tskit copied to clipboard
Assign a table to a TableCollection
Currently this results in an error:
>>> tc = tskit.TableCollection(1)
>>> tc.individuals=tskit.IndividualTable()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: can't set attribute
Is there any reason that it shouldn't be possible? The implementation of this would seem to be clearing the existing table and copying the table data across.
Came up in the context of https://github.com/tskit-dev/tskit/pull/1487#discussion_r648387416
Makes sense to me!
However, since all these tables refer to each other, it doesn't seem that common that we'd actually have an IndividualTable that makes sense by itself - although I see the counter-example in #1487.
Or, I could imagine a nice way to modify, say, a mutation table would be
muts = tskit.MutationTable()
for mut in tc.mutations:
muts.add_row( <some modification of mut> )
tc.mutations = muts
... although, now we could do
old_muts = tc.mutations
tc.mutations.clear()
for mut in old_muts:
tc.mutations.add_row( ... )
I agree, we have a lot of code that does the copy-clear-mutate-add/append dance. I prefer making a new table then assigning it as in your first example.
Sure, it makes sense. We would need to be clear that the semantics is this though:
old_table = tc.individuals
tc.individuals = new_table
assert tc.individuals == new_table
assert tc.individuals is not new_table
assert tc.individuals is old_table
The fact that we're not actually assigning the object itself is what put me off this in the past, as that seems a bit counter intuitive and contrary to what you might think of the semantics of assignment. The object ownership stuff holding the tables in a TableCollection is very tricky, and I definitely don't want to go messing with that.
I guess another way of doing this would be to have a method replace
or something, which does the same thing, which doesn't require us getting into these murky waters:
tc.individuals.replace(new_table)
assert tc.individuals == new_table
assert tc.individuals is not new_table
I'm not sure about the name, but "Replace the contents of this table with the contents of the specified table" seems like a clear definition?
I agree that it would be better to have this be a named method, rather than doing sneaky modification on assingment. We should add a helpful message to the error on assignment that signposts the named function. How about replace_with
?
Yes, that sounds good.
If we don't just get to assign with =
then I'm not sure there'd be a big advantage over the current way:
old_muts = tc.mutations.copy()
tc.mutations.clear()
for mut in old_muts:
tc.mutations.add_row( ... )
No harm in having two ways of doing it, though. It would make it more natural to pass around tables on their own.
Presumably the more efficient way is to set the columns rather than add each row (as this doesn't check the metadata):
import msprime
ts = msprime.simulate(10)
tables = ts.dump_tables()
node_map = np.arange(ts.num_nodes)[::-1]
new_table = ts.tables.nodes[node_map] # new table with a reversed node order
tables.nodes.set_columns(
flags=new_table.flags,
time=new_table.time,
population=new_table.population,
individual=new_table.individual,
metadata=new_table.metadata,
metadata_offset=new_table.metadata_offset,
)
I think this should be fairly efficient?
Actually, should it be as simple as this in the BaseTable class:
def replace_with(self, other):
"""
Entirely replace this table with another
"""
if type(self) != type(other):
raise ValueError("replacement column is not of the same type")
tables.nodes.set_columns(
metadata_schema = repr(other.metadata_schema), # encode the schema
**{column: getattr(other, column) for column in self.column_names}
)
I'm reopening this as we haven't done assignment, but also haven't decided to not do it!