John Huddleston
John Huddleston
Very nice! Thank you, @tsibley!
The point about a general deduplicate command might help us scope this issue. Deduplicating sequences (FASTA files) is a different process from deduplicating data frames. Sequences often don't include metadata...
We should definitely implement the first short-term solution. I'm not sure about the best place to have the discussion about deduplicating data. A tutorial (or how-to guide, depending on the...
While working on the seasonal flu builds with data from [FluDB](https://www.fludb.org/brc/home.spg?decorator=influenza), I noticed that [seqkit has an rmdup command](https://bioinf.shenwei.me/seqkit/usage/#rmdup) that can remove duplicate sequences by name or sequence content and...
@mvolz There isn't a command line interface planned for this functionality, but you can accomplish this conversion using existing augur python functions. ```python from augur.utils import json_to_tree import Bio.Phylo import...
Just a note that I ended up writing [a script to convert Auspice JSON to Newick tree and metadata TSV](https://gist.github.com/huddlej/5d7bd023d3807c698bd18c706974f2db). We haven't decided where this would live in Augur yet...
+1 for a flag that removes the outgroup. [As I mentioned in a related PR](https://github.com/nextstrain/seasonal-flu/pull/89#user-content-fn-1-b358a261c2e384c1dddba7a9ef60b771), we may want to rename `--remove-outgroup` to `--remove-root`, to make the flag name consistent with...
@joverlee521, @j23414, and I looked into this a bit more today and came to the same conclusion that supporting a `--remove-outgroup` flag in `augur refine` would require modification inside TreeTime's...
@tsibley notes that there is a lot of support for storing document inside the JSON schema for the config file itself and there are tools that integrate with sphinx to...
@matsen That's a good point to clarify! It looks like Trevor originally [applied clustering in a Mathematica notebook](https://github.com/trvrb/antigen/blob/master/example/antigen-analysis.nb), so I think a scikit-learn approach would be a perfect first start...