augur
augur copied to clipboard
Add notebook for parsing json to ete tree objects for downstream analysis
Description of proposed changes
Documents how to parse nextstrain JSON files to a common library for downstream phylogenetic analysis. I suspect this is code that is being reinvented by a lot of people, and can likely be a stumbling block for some folks looking to integrate with other tools in the ecosystem.
I opted for ETE toolkit because (1) it handles large trees well and has well thought-out built in methods for tree traversal and manipulation; (2) it can handle custom metadata attributes for each leaf and node quite neatly (crucial for nextstrain's highly annotated trees); and (3) it has a nice visualization library.
At the bottom, I also included an example of how to use baltic-style plotting (although not actually the baltic library itself).
Related issue(s)
Fixes #814
Testing
Uses the tree available at auspice/examples/v2.json as an example. I've run this on several SARS-CoV-2 JSONs over the past month.