pyani
pyani copied to clipboard
Add `pyani tree` subcommand
Summary:
Add Newick tree output to pyani
.
Description:
In #186 a question was raised about generating trees from pyani
directly. At the moment this isn't implemented, but could be done fairly readily. One API implementation for writing might be:
pyani tree --formats [newick,nexus] <output_dir> <run ID>
or for graphical output:
pyani plot --formats [png,pdf] --method ete3 <output_dir> <run ID>
Current Output:
Not implemented
pyani Version:
v0.3+
I was just looking to see if this was implemented, as I wanted to extract a Newick format tree from pyANI to compare to another clustering method, e.g. visually with https://phylo.io/ https://doi.org/10.1093/molbev/msw080
In the interim, it ought to be straightforward to produce a dendrogram in R
or similar by taking in the ANI matrix and using hclust. You'll get more control over the clustering method then (rather than trusting our choice).
Yep, I will try the R code snippet on the linked issue for this.
@baileythegreen is working on this, just now. There should be a pyani tree
option in v0.3 coming soon.
I have initially implemented this as an option within pyani plot
, so that Newick formatted output and plotted dendrograms are created using an additional option in the pyani plot
parser. This doesn't yet afford much control over which Newick files or dendrograms are created, but that will be possible, eventually.
The tree_186
branch will currently create a Newick file and a dendrogram for each axis of matrix, named accordingly. Species names (along with the gnome number) get added to the dendrograms, but the Newick files only show the numbers.
If you wanted to try this now, from the tree_186
branch you would run:
pyani plot -o <output_file> --run_id <run_id> --dbpath <database_file> -l <log_file> --tree
but be warned it will generate 16 plots and 10 Newick files.
It also adds some new dependencies.
Newick file output has been modified so there is only one file, with the 'name' of the tree inside a comment.
Thus far, I know this format works both with libraries I have tested in Python, and with the https://phylo.io site Peter linked.
[col_newick_identity_run1] ((((((((7:0.05,2:0.05):0.20,4:0.25):0.02,(34:0.16,30:0.16):0.11):0.04,((39:0.01,11:0.01):0.00,43:0.01):0.29):0.03,(((44:0.02,23:0.02):0.16,(12:0.01,5:0.01):0.17):0.01,((41:0.01,26:0.01):0.12,49:0.14):0.06):0.15):0.02,((25:0.00,19:0.00):0.06,27:0.07):0.29):0.04,((((42:0.01,29:0.01):0.01,45:0.02):0.21,((48:0.00,22:0.00):0.00,8:0.00):0.23):0.02,((46:0.01,10:0.01):0.00,18:0.01):0.24):0.15):0.05,(((((47:0.03,20:0.03):0.02,35:0.05):0.18,36:0.23):0.03,((40:0.03,33:0.03):0.04,13:0.06):0.20):0.07,((((((28:0.00,21:0.00):0.02,24:0.03):0.11,((38:0.00,14:0.00):0.00,37:0.00):0.13):0.03,((31:0.00,6:0.00):0.05,17:0.05):0.12):0.01,((32:0.00,9:0.00):0.01,15:0.01):0.17):0.06,((16:0.01,1:0.01):0.00,3:0.01):0.24):0.08):0.11);
[row_newick_identity_run1] ((((((((7:0.05,2:0.05):0.20,4:0.25):0.02,(34:0.16,30:0.16):0.11):0.04,((39:0.01,11:0.01):0.00,43:0.01):0.29):0.03,(((44:0.02,23:0.02):0.16,(12:0.01,5:0.01):0.17):0.01,((41:0.01,26:0.01):0.12,49:0.14):0.06):0.15):0.02,((25:0.00,19:0.00):0.06,27:0.07):0.29):0.04,((((42:0.01,29:0.01):0.01,45:0.02):0.21,((48:0.00,22:0.00):0.00,8:0.00):0.23):0.02,((46:0.01,10:0.01):0.00,18:0.01):0.24):0.15):0.05,(((((47:0.03,20:0.03):0.02,35:0.05):0.18,36:0.23):0.03,((40:0.03,33:0.03):0.04,13:0.06):0.20):0.07,((((((28:0.00,21:0.00):0.02,24:0.03):0.11,((38:0.00,14:0.00):0.00,37:0.00):0.13):0.03,((31:0.00,6:0.00):0.05,17:0.05):0.12):0.01,((32:0.00,9:0.00):0.01,15:0.01):0.17):0.06,((16:0.01,1:0.01):0.00,3:0.01):0.24):0.08):0.11);
[col_newick_coverage_run1] ((((((7:0.15,2:0.15):0.89,4:1.04):0.23,((39:0.07,11:0.07):0.07,43:0.15):1.12):0.92,((((42:0.02,29:0.02):0.05,45:0.07):0.55,((48:0.01,22:0.01):0.01,8:0.01):0.61):0.04,((46:0.07,10:0.07):0.29,18:0.36):0.30):1.53):0.66,((((41:0.09,26:0.09):0.08,49:0.17):0.09,(44:0.11,23:0.11):0.14):0.09,(12:0.10,5:0.10):0.23):2.51):0.63,(((34:0.39,30:0.39):1.33,((25:0.01,19:0.01):0.24,27:0.25):1.47):0.76,(((((47:0.19,20:0.19):0.04,35:0.24):0.36,((40:0.13,33:0.13):0.05,13:0.18):0.42):0.09,36:0.68):0.18,((((((31:0.02,6:0.02):0.12,17:0.14):0.18,((38:0.01,37:0.01):0.02,14:0.03):0.29):0.01,((28:0.14,21:0.14):0.06,24:0.20):0.13):0.19,((32:0.01,9:0.01):0.10,15:0.11):0.40):0.02,((16:0.13,3:0.13):0.01,1:0.14):0.40):0.33):1.61):1.00);
[row_newick_coverage_run1] (((((((39:0.07,11:0.07):0.07,43:0.15):0.82,4:0.97):0.32,(7:0.15,2:0.15):1.13):0.91,((((46:0.08,10:0.08):0.14,18:0.22):0.37,((42:0.02,29:0.02):0.06,45:0.08):0.51):0.06,((48:0.01,22:0.01):0.01,8:0.01):0.64):1.55):0.67,((((41:0.09,26:0.09):0.08,49:0.17):0.08,(44:0.11,23:0.11):0.14):0.08,(12:0.12,5:0.12):0.21):2.52):0.60,(((34:0.37,30:0.37):1.35,((25:0.01,19:0.01):0.19,27:0.20):1.52):0.68,(((((47:0.19,20:0.19):0.05,35:0.24):0.33,36:0.57):0.03,((40:0.16,33:0.16):0.03,13:0.19):0.41):0.19,((((((28:0.10,21:0.10):0.10,24:0.20):0.12,((38:0.02,37:0.02):0.01,14:0.03):0.29):0.01,((31:0.02,6:0.02):0.12,17:0.14):0.19):0.09,((16:0.14,1:0.14):0.06,3:0.20):0.22):0.08,((32:0.01,9:0.01):0.15,15:0.16):0.34):0.28):1.61):1.06);
[col_newick_aln_lengths_run1] ((((((34:1729666.71,30:1729666.71):6021310.16,((25:57790.43,19:57790.43):1102066.41,27:1159856.85):6591120.02):2813471.27,(((7:680495.33,2:680495.33):3917876.19,4:4598371.52):1855310.71,((39:385588.94,11:385588.94):396069.71,43:781658.64):5672023.59):4110765.91):127135.17,((((41:326365.60,26:326365.60):312321.94,49:638687.54):350057.03,(44:422480.70,23:422480.70):566263.87):301298.21,(12:403469.92,5:403469.92):886572.86):9401540.53):1776281.31,((((46:320686.03,10:320686.03):1422273.38,18:1742959.42):1314588.33,((42:76629.66,29:76629.66):268554.45,45:345184.11):2712363.63):218423.56,((48:29763.67,22:29763.67):37698.65,8:67462.33):3208508.97):9191893.31):3884533.89,(((((47:920299.65,20:920299.65):196364.64,35:1116664.29):1703901.93,((40:589103.58,33:589103.58):267177.40,13:856280.99):1964285.23):349841.61,36:3170407.83):988677.82,((((16:579547.62,3:579547.62):60666.90,1:640214.52):1885771.47,((32:70434.69,9:70434.69):477082.81,15:547517.50):1978468.48):51421.91,((((28:683319.16,21:683319.16):295324.41,24:978643.58):594529.05,((38:45706.33,37:45706.33):80795.18,14:126501.51):1446671.11):54092.57,((31:94617.41,6:94617.41):610986.06,17:705603.46):921661.73):950142.71):1581677.75):12193312.86);
[row_newick_aln_lengths_run1] ((((((34:1729666.71,30:1729666.71):6021310.16,((25:57790.43,19:57790.43):1102066.41,27:1159856.85):6591120.02):2813471.27,(((7:680495.33,2:680495.33):3917876.19,4:4598371.52):1855310.71,((39:385588.94,11:385588.94):396069.71,43:781658.64):5672023.59):4110765.91):127135.17,((((41:326365.60,26:326365.60):312321.94,49:638687.54):350057.03,(44:422480.70,23:422480.70):566263.87):301298.21,(12:403469.92,5:403469.92):886572.86):9401540.53):1776281.31,((((46:320686.03,10:320686.03):1422273.38,18:1742959.42):1314588.33,((42:76629.66,29:76629.66):268554.45,45:345184.11):2712363.63):218423.56,((48:29763.67,22:29763.67):37698.65,8:67462.33):3208508.97):9191893.31):3884533.89,(((((47:920299.65,20:920299.65):196364.64,35:1116664.29):1703901.93,((40:589103.58,33:589103.58):267177.40,13:856280.99):1964285.23):349841.61,36:3170407.83):988677.82,((((16:579547.62,3:579547.62):60666.90,1:640214.52):1885771.47,((32:70434.69,9:70434.69):477082.81,15:547517.50):1978468.48):51421.91,((((28:683319.16,21:683319.16):295324.41,24:978643.58):594529.05,((38:45706.33,37:45706.33):80795.18,14:126501.51):1446671.11):54092.57,((31:94617.41,6:94617.41):610986.06,17:705603.46):921661.73):950142.71):1581677.75):12193312.86);
[col_newick_sim_errors_run1] (((((((31:7951.26,6:7951.26):226334.73,17:234286.00):432930.35,((32:5819.48,9:5819.48):35993.47,15:41812.94):625403.40):45283.52,(((28:61050.96,21:61050.96):69708.02,24:130758.98):448708.50,((38:5065.01,37:5065.01):3643.35,14:8708.36):570759.12):133032.38):178830.84,((16:47324.48,1:47324.48):1570.55,3:48895.03):842435.68):70755.85,((((47:131768.58,20:131768.58):73675.48,35:205444.06):454534.31,36:659978.37):126937.89,((40:107620.12,33:107620.12):147284.15,13:254904.27):532012.00):175170.29):402086.76,(((((44:74423.27,23:74423.27):486600.08,(12:26988.70,5:26988.70):534034.65):32481.97,((41:50073.89,26:50073.89):403490.32,49:453564.21):139941.11):248590.32,((((25:4576.29,19:4576.29):266115.40,27:270691.68):75627.70,30:346319.38):82470.23,34:428789.61):413306.03):67447.45,((((((7:184456.97,2:184456.97):227902.71,4:412359.68):98009.14,((39:55158.79,11:55158.79):4652.39,43:59811.18):450557.63):89079.90,((46:28134.13,10:28134.13):120563.12,18:148697.25):450751.46):67935.11,((42:8742.68,29:8742.68):87944.86,45:96687.55):570696.28):66128.94,((48:1673.39,22:1673.39):1350.79,8:3024.18):730488.58):176030.33):454630.22);
[row_newick_sim_errors_run1] (((((((31:7951.26,6:7951.26):226334.73,17:234286.00):432930.35,((32:5819.48,9:5819.48):35993.47,15:41812.94):625403.40):45283.52,(((28:61050.96,21:61050.96):69708.02,24:130758.98):448708.50,((38:5065.01,37:5065.01):3643.35,14:8708.36):570759.12):133032.38):178830.84,((16:47324.48,1:47324.48):1570.55,3:48895.03):842435.68):70755.85,((((47:131768.58,20:131768.58):73675.48,35:205444.06):454534.31,36:659978.37):126937.89,((40:107620.12,33:107620.12):147284.15,13:254904.27):532012.00):175170.29):402086.76,(((((44:74423.27,23:74423.27):486600.08,(12:26988.70,5:26988.70):534034.65):32481.97,((41:50073.89,26:50073.89):403490.32,49:453564.21):139941.11):248590.32,((((25:4576.29,19:4576.29):266115.40,27:270691.68):75627.70,30:346319.38):82470.23,34:428789.61):413306.03):67447.45,((((((7:184456.97,2:184456.97):227902.71,4:412359.68):98009.14,((39:55158.79,11:55158.79):4652.39,43:59811.18):450557.63):89079.90,((46:28134.13,10:28134.13):120563.12,18:148697.25):450751.46):67935.11,((42:8742.68,29:8742.68):87944.86,45:96687.55):570696.28):66128.94,((48:1673.39,22:1673.39):1350.79,8:3024.18):730488.58):176030.33):454630.22);
[col_newick_hadamard_run1] ((((((34:0.49,30:0.49):1.23,((25:0.01,19:0.01):0.28,27:0.29):1.43):0.41,(((7:0.19,2:0.19):0.93,4:1.12):0.24,((39:0.08,11:0.08):0.07,43:0.16):1.21):0.77):0.27,((((46:0.07,10:0.07):0.27,18:0.35):0.43,((42:0.02,29:0.02):0.07,45:0.09):0.69):0.05,((48:0.01,22:0.01):0.01,8:0.01):0.82):1.57):0.25,((((41:0.10,26:0.10):0.18,49:0.28):0.12,(44:0.13,23:0.13):0.28):0.08,(12:0.11,5:0.11):0.37):2.17):0.61,(((((47:0.21,20:0.21):0.06,35:0.28):0.49,((40:0.15,33:0.15):0.08,13:0.23):0.54):0.04,36:0.80):0.25,((((((28:0.13,21:0.13):0.09,24:0.22):0.21,((38:0.01,37:0.01):0.02,14:0.02):0.41):0.03,((31:0.02,6:0.02):0.17,17:0.18):0.28):0.18,((32:0.01,9:0.01):0.11,15:0.12):0.53):0.05,((16:0.14,3:0.14):0.01,1:0.15):0.55):0.35):2.21);
[row_newick_hadamard_run1] ((((((34:0.48,30:0.48):1.24,((25:0.01,19:0.01):0.24,27:0.26):1.47):0.44,((((39:0.08,11:0.08):0.07,43:0.16):0.93,4:1.09):0.28,(7:0.19,2:0.19):1.17):0.80):0.25,((((46:0.08,10:0.08):0.14,18:0.22):0.53,((42:0.02,29:0.02):0.08,45:0.09):0.65):0.07,((48:0.01,22:0.01):0.01,8:0.01):0.80):1.59):0.25,((((41:0.10,26:0.10):0.18,49:0.28):0.12,(44:0.13,23:0.13):0.28):0.08,(12:0.12,5:0.12):0.35):2.18):0.57,(((((47:0.21,20:0.21):0.06,35:0.28):0.43,36:0.71):0.05,((40:0.17,33:0.17):0.06,13:0.24):0.52):0.22,((((((28:0.09,21:0.09):0.13,24:0.22):0.21,((38:0.02,37:0.02):0.01,14:0.02):0.41):0.04,((31:0.02,6:0.02):0.17,17:0.19):0.28):0.13,((16:0.15,1:0.15):0.05,3:0.20):0.40):0.05,((32:0.01,9:0.01):0.14,15:0.15):0.49):0.33):2.25);
Can confirm the format works with FigTree
- thanks @baileythegreen
I wrongly assumed this was already on master, is https://github.com/widdowquinn/pyani/commits/tree_186 the latest version of pyani plot --tree
if I wanted to try this out?
It is. Right now it plots trees as part of the plotting subcommand, and does not offer much customisation. I am in the process of refining this, and also creating a separate subcommand that allows more customisation. All of that is to be done on the tree_186
branch.
There isn't a (draft) PR for tree_186
yet is there? I would have comments, to start with it needs to declare the added ete3
dependency.
Also I seem to be missing some graphical dependency stuff... getting these warnings multiple times (despite not warning to actually use the display).
WARNING: QApplication was not created in the main() thread.
qt.qpa.xcb: could not connect to display
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.
The above seems not to write a tree (i.e. something aborts after the warnings).
It looks like right now the branch only adds the tree to the heatmap code in the seaborn plotting pyani/pyani_graphics/sns/__init__.py
(others still to be implemented), but adding any --method XXX
argument including --method seaborn
seems to skip the trees.
Hi Peter,
There was not a draft PR for this; but I've made one here, to make it easier for you to give feedback/comments.
The ete3
dependency is listed in requirements.txt
on the tree_186
branch; in master
, this file is used when installing, but if you are using your normal installation just on this branch, that probably wouldn't happen as that package has not been used elsewhere in pyani
. You will also need PyQt
(or PyQT5
), which I think might address your graphical dependency warnings (I haven't seen those before). This is also listed in the requirements.txt
file.
I've responded to your comment on the last commit here; that's an issue that I need to solve; the test suite on my computer seems happy with the current version, but CircleCI here is failing at this point. I'm testing a potential solution right now, but may also discuss this with Leighton tomorrow.
It looks like right now the branch only adds the tree to the heatmap code in the seaborn plotting pyani/pyani_graphics/sns/init.py (others still to be implemented), but adding any --method XXX argument including --method seaborn seems to skip the trees.
I will look into this; it seems odd.
Thank you!
https://github.com/widdowquinn/pyani/pull/370