Description

Fixes #1294. Should probably add some tests for attributes of the graph like branch_length, left, and right.

PR Checklist:

[x] Tests that fully cover new/changed functionality.
[x] Documentation including tutorial content if appropriate.
[x] Changelogs, if there are API changes.

Apr 02 '21 10:04 hyanwong

Codecov Report

Merging #1296 (935d052) into main (f277006) will increase coverage by 0.00%. The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1296   +/-   ##
=======================================
  Coverage   93.81%   93.81%           
=======================================
  Files          26       26           
  Lines       22187    22198   +11     
  Branches     1006     1009    +3     
=======================================
+ Hits        20814    20825   +11     
  Misses       1340     1340           
  Partials       33       33

Flag	Coverage Δ
c-tests	`92.44% <ø> (ø)`
lwt-tests	`92.97% <ø> (ø)`
python-c-tests	`95.15% <100.00%> (+<0.01%)`	:arrow_up:
python-tests	`98.86% <100.00%> (+<0.01%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
python/tskit/trees.py	`97.91% <100.00%> (+0.01%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f277006...935d052. Read the comment docs.

Apr 02 '21 10:04 codecov[bot]

Very neat! Looks very nice, but I think we want to be sure that this is the "right" graph representation before we make it part of the public API. Some questions:

What do these graphs look like when we plot them? I.e., what does a single tree ts look like, and one with two trees?
What would be need to do to round trip the topology? Ideally we would want

d = ts.as_dict_of_dicts()
tsp = tskit.TreeSequence.from_dict_of_dicts(d)
assert tsp.equals(ts, ignore_provenance=True)

We would need to have some visibilty on whether this is possible before we can fix on the graph representation.

Apr 06 '21 08:04 jeromekelleher

Nice! In the round trip example @jeromekelleher, do you mean that just the topology would round trip? Or other things like populations and migrations? (I guess these could be included as an attribute at the root dict?)

Apr 06 '21 10:04 benjeffery

Just the topology initially, we don't have to worry about the other stuff like individuals, migrations etc. But yes, in the case of populations it'd probably be simplest to add them in as attrs in the root dict.

Apr 06 '21 11:04 jeromekelleher

AFAIK when NetworkX imports from dict of dicts you can only set attributes on edges: I don't know how to set them on nodes. So round tripping might be difficult. Even setting node times would have to be imperfectly guessed from edge lengths.

Apr 06 '21 11:04 hyanwong

AFAIK when NetworkX imports from dict of dicts you can only set attributes on edges: I don't know how to set them on nodes. So round tripping might be difficult. Even setting node times would have to be imperfectly guessed from edge lengths.

We would round-trip the dicts, not the NetworkX representation.

Apr 06 '21 11:04 benjeffery

Yes - I'm afraid we have to do the due diligence here @hyanwong. We need to spend time exploring the properties of this representation if we're doing to make it part of the API. The first question we need to answer is "is this a bijection"?

Apr 06 '21 11:04 jeromekelleher

I assumed this would simply be an analogue to Tree.as_dict_of_dicts, so purely a (non complete) export format for NetworkX. If you want it to be something more, then it gets complex, I agree.

Apr 06 '21 12:04 hyanwong

We don't have a Tree.from_dod method, so I'm not sure we need (want?) a TS one?

Apr 06 '21 12:04 hyanwong

Apologies for the terse replies: I'm on a mobile. There's no reason to merge this any time soon. As it is, I need to use a slightly bespoke version anyway, which randomises the node IDs (otherwise locating isomorphic nodes is biased towards the input order). I just imagine that if I find a function like this useful for analysing the tree sequence as a graph, others might too. So it's probably worth having a similar function in the API. But as you say, worth doing the due diligence first. I'm in no hurry either way.

Apr 06 '21 13:04 hyanwong

OK - since we're making quite a big decision here (what's the graph representation of a tree sequence) @hyanwong, I think it would be better if you used the version you have for your own work, and we took our time with this. The Tree version is much simpler and we don't worry about round tripping it because there's no way of creating a Tree on it's own anyway. The TreeSequence is the core data structure.

So, I think we can mark this as a draft?

Apr 06 '21 13:04 jeromekelleher

So, I think we can mark this as a draft?

Done.

FWIW I don't think as_dict_of_dict should be the "canonical" graph representation of a TS. I think this only needs to be a function used for reading a specific format into NetworkX. We shouldn't claim anything grander about it. If we want a canonical format we could just use .__dict__() (or something) instead.

Apr 06 '21 14:04 hyanwong

Could we update this method to return the representation I'm advocating for here @hyanwong? https://github.com/tskit-dev/tskit/discussions/2068#discussioncomment-1821721

I think it's better to annotate the nodes with their time, and the edges with intervals, rather than making a multigraph with branch lengths like we have here. Making the multigraph version for viz should be simple enough as a post-processing step.

Dec 16 '21 12:12 jeromekelleher

Could we update this method to return the representation I'm advocating for here @hyanwong? #2068 (reply in thread)

I haven't figured out how to use the "dict_of_dicts" version to allocate attributes (like time) to nodes. So maybe this isn't the right way anyway?

I think it's better to annotate the nodes with their time, and the edges with intervals, rather than making a multigraph with branch lengths like we have here. Making the multigraph version for viz should be simple enough as a post-processing step.

Hmm, is that true. Isn't it just as easy to go from multigraph -> graph as the other way round?

Dec 16 '21 13:12 hyanwong

Add dict_of_dicts to tree sequence

Description

PR Checklist:

Codecov Report