augur icon indicating copy to clipboard operation
augur copied to clipboard

Add notebook for parsing json to ete tree objects for downstream analysis

Open sidneymbell opened this issue 3 years ago • 0 comments

Description of proposed changes

Documents how to parse nextstrain JSON files to a common library for downstream phylogenetic analysis. I suspect this is code that is being reinvented by a lot of people, and can likely be a stumbling block for some folks looking to integrate with other tools in the ecosystem.

I opted for ETE toolkit because (1) it handles large trees well and has well thought-out built in methods for tree traversal and manipulation; (2) it can handle custom metadata attributes for each leaf and node quite neatly (crucial for nextstrain's highly annotated trees); and (3) it has a nice visualization library.

At the bottom, I also included an example of how to use baltic-style plotting (although not actually the baltic library itself).

Related issue(s)

Fixes #814

Testing

Uses the tree available at auspice/examples/v2.json as an example. I've run this on several SARS-CoV-2 JSONs over the past month.

sidneymbell avatar Jan 05 '22 00:01 sidneymbell