tskit iterating over count_topologies method never stops

I would expect the Tree.count_topologies() method to return the Counter objects only for the valid sample sets, but iterating over a tskit.TopologyCounter seems never to end:

c = tskit.Tree.generate_star(3).count_topologies([[0],[1],[2]])
for a in c: # Infinite loop
    print(a)

Am I misunderstanding something @daniel-goldstein ?

May 23 '21 15:05 hyanwong

It seems like it should be for a in c.topologies()

From @daniel-goldstein :

So your example in the github issue is certainly a bug. The class does not properly implement the right methods to be iterable, so we should either implement those to essentially pass through to c.topologies, since that is properly iterable, or explicitly error if someone tries to iterate a topology counter. Unforunately it's probably not properly documented, I'm happy to fix that. I am in favor of adding the correct iteration methods and then writing at least an example in the documentation for topology counter along the lines of:
 for species, topologies in c.values():
 print(f'The distribution of topologies for the subset of species {species} is {topologies}')
I think a proper tutorial and this clarifying documentation are both warranted

Jun 22 '21 14:06 hyanwong

Also note that count_topologies is not actually mentioned or linked to from https://tskit.dev/tskit/docs/stable/combinatorics.html# which seems weird

Jun 22 '21 14:06 hyanwong

Yes this is indeed a bug and __iter__ is not currently defined in the TopologyCounter, which it should be!

I think there's a small question of interface here which is how much to align TopologyCounter with a normal python dictionary. We explicitly wrap a defaultdict instead of extending one since the semantics of __getitem__ is slightly different than what you would normally get from a dictionary. But does that mean that iteration should work differently? I think the two options are:

Follow Python dictionary semantics where iter(topology_counter) iterates over the keys (subsets of species) and the quoted iteration above would occur by topology_counter.items(). In this case I would implement __iter__, keys, items. I could do values but I can't think of a context in which values without knowing the relevant key would be useful and might just lead to confusion. keys is also not very necessary since it will always be the power set of species/populations/etc. but might be helpful for illustrative purposes.
Intentionally deviate from the python dictionary interface and just make __iter__ yield tuples of (species, topologies) like in the above example. So you would write something of the form:

for species, topologies in t.count_topologies():
	print(f'The distribution of topologies for the subset of species {species} is {topologies}')

I can PR a fix for whichever path makes the most sense from a user perspective.

Jun 23 '21 12:06 daniel-goldstein

I think 1. is probably easier to document - we simply say that we return a dictionary-like item. Or could we even convert the internal defaultdict straight to a normal dict when we return it? 1. also allows us just to get the keys, should such a thing be useful.

Jun 23 '21 13:06 hyanwong

I'm fine with calling it a dictionary-like item, but don't think we can just return the internal dict because of the way we manipulate the keys when accessing items is different from normal dict semantics. I'll make a PR and we can move the discussion over there. Thanks!

Jun 26 '21 22:06 daniel-goldstein

Hi @daniel-goldstein - I am returning to the idea of fleshing out the combinatorics tutorial and hit this again. Did you ever get anywhere making a PR for this (I can't see any linked discussion)

Nov 18 '23 11:11 hyanwong

Hi @hyanwong! My apologies, I don't believe I ever got to making a PR for this. Been a couple years since my head has been in tree sequences but I would be happy to try to answer any questions!

Nov 18 '23 17:11 daniel-goldstein