iterating over count_topologies method never stops
I would expect the Tree.count_topologies() method to return the Counter objects only for the valid sample sets, but iterating over a tskit.TopologyCounter seems never to end:
c = tskit.Tree.generate_star(3).count_topologies([[0],[1],[2]])
for a in c: # Infinite loop
print(a)
Am I misunderstanding something @daniel-goldstein ?
It seems like it should be for a in c.topologies()
From @daniel-goldstein :
So your example in the github issue is certainly a bug. The class does not properly implement the right methods to be iterable, so we should either implement those to essentially pass through to c.topologies, since that is properly iterable, or explicitly error if someone tries to iterate a topology counter. Unforunately it's probably not properly documented, I'm happy to fix that. I am in favor of adding the correct iteration methods and then writing at least an example in the documentation for topology counter along the lines of:
for species, topologies in c.values(): print(f'The distribution of topologies for the subset of species {species} is {topologies}')I think a proper tutorial and this clarifying documentation are both warranted
Also note that count_topologies is not actually mentioned or linked to from https://tskit.dev/tskit/docs/stable/combinatorics.html# which seems weird
Yes this is indeed a bug and __iter__ is not currently defined in the TopologyCounter, which it should be!
I think there's a small question of interface here which is how much to align TopologyCounter with a normal python dictionary. We explicitly wrap a defaultdict instead of extending one since the semantics of __getitem__ is slightly different than what you would normally get from a dictionary. But does that mean that iteration should work differently? I think the two options are:
-
Follow Python dictionary semantics where
iter(topology_counter)iterates over the keys (subsets of species) and the quoted iteration above would occur bytopology_counter.items(). In this case I would implement__iter__,keys,items. I could dovaluesbut I can't think of a context in whichvalueswithout knowing the relevant key would be useful and might just lead to confusion.keysis also not very necessary since it will always be the power set of species/populations/etc. but might be helpful for illustrative purposes. -
Intentionally deviate from the python dictionary interface and just make
__iter__yield tuples of (species, topologies) like in the above example. So you would write something of the form:
for species, topologies in t.count_topologies():
print(f'The distribution of topologies for the subset of species {species} is {topologies}')
I can PR a fix for whichever path makes the most sense from a user perspective.
I think 1. is probably easier to document - we simply say that we return a dictionary-like item. Or could we even convert the internal defaultdict straight to a normal dict when we return it? 1. also allows us just to get the keys, should such a thing be useful.
I'm fine with calling it a dictionary-like item, but don't think we can just return the internal dict because of the way we manipulate the keys when accessing items is different from normal dict semantics. I'll make a PR and we can move the discussion over there. Thanks!
Hi @daniel-goldstein - I am returning to the idea of fleshing out the combinatorics tutorial and hit this again. Did you ever get anywhere making a PR for this (I can't see any linked discussion)
Hi @hyanwong! My apologies, I don't believe I ever got to making a PR for this. Been a couple years since my head has been in tree sequences but I would be happy to try to answer any questions!