snp-sites
snp-sites copied to clipboard
Output invariant sites and nucleotide frequencies
In general, phylogenetic programs use invariant sites for likelihood calculations. However, a number of programs, such as RAxML and BEAST, can perform ascertainment bias corrections given the number of invariant sites and the frequencies of nucleotides in the alignment. If SNP-sites output these values, they could be used as direct inputs for RAxML, for example.
I second this suggestion.
Either add a -s
(stats?) option to report all sorts of columnar statistics, characters used etc.
OR
Always output this to stderr
as part of the logs.
Grand, give me a toy example and I'll sort it out
On 27 September 2017 at 08:53, Torsten Seemann [email protected] wrote:
I second this suggestion.
Either add a -s (stats?) option to report all sorts of columnar statistics, characters used etc.
OR
Always output this to stderr as part of the logs.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sanger-pathogens/snp-sites/issues/62#issuecomment-332439973, or mute the thread https://github.com/notifications/unsubscribe-auth/AABeVy1rlA4xOP4bxeVn-EM0LNPbtoOhks5smf7igaJpZM4PhZS0 .
From your example in the README:
sample1 AGACACAGTCAC sample2 AGACAC----AC sample3 AAACGCATTCAN
-s (or stderr) would produce:
Input stats: Alignment length: 12 Proportion Ns: 0.03 Proportion Gap sites: 0.11 Nucleotide frequencies (A,G,C,T): 45.2,12.9,32.3,9.7
Output stats: SNP alignment length: 3 Number Gap sites (-) introduced: 1 Proportion gap sites: 0.11
Thinking about this more, I obviously came up with a couple other useful stats. seqtk comp can produce similar stats, but having them in one tool with the speed of SNP-sites would be great.
But output in a machine readable format so we can parse or JSON-ate.
@andrewjpage
Hello. I was curious if this was ever implemented?