usher icon indicating copy to clipboard operation
usher copied to clipboard

getting non-coding mutations using `matUtils summary --translate`

Open jbloom opened this issue 2 years ago • 3 comments

Is there a way to get mutations at non-coding sites along branches the same way that summary --translate does it for coding sites? As far as I can tell, the current command only tracks mutations on branches at coding sites.

jbloom avatar Mar 30 '23 03:03 jbloom

matUtils extract --sample-paths makes a large file with the path of nucleotide mutations to each sample (leaf node).

matUtils extract --clade-paths makes a file with the nucleotide path to each annotated clade or lineage (for the SARS-CoV-2 UShER trees, both Nextstrain clades and Pango lineages are annotated).

AngieHinrichs avatar Mar 30 '23 18:03 AngieHinrichs

I see the usage message for matUtils summary --translate indicates that it should output both AA and nucleotide mutations:

  -t [ --translate ] arg              Write a tsv listing the amino acid and 
                                      nucleotide mutations at each node.

-- @jmcbroome is it implicitly "nucleotide mutations for coding sites only" or was it meant to cover both coding and noncoding sites? If it's coding-only (because it's about AA translation?) then it would be helpful for the usage message to state that explicitly.

AngieHinrichs avatar Mar 30 '23 18:03 AngieHinrichs

Thanks @AngieHinrichs, you are correct that --sample-paths lets me get comparable information with a little post-processing, so that is great!

I will keep this issue open for now in case you want to wait to resolve your question immediately above to @jmcbroome about clarifying either the results or usage message for --translate. But from the perspective of my original question you can consider this issue resolved and close it any time.

jbloom avatar Mar 31 '23 13:03 jbloom