snp-sites icon indicating copy to clipboard operation
snp-sites copied to clipboard

Output invariant sites and nucleotide frequencies

Open EpiDemos82 opened this issue 6 years ago • 5 comments

In general, phylogenetic programs use invariant sites for likelihood calculations. However, a number of programs, such as RAxML and BEAST, can perform ascertainment bias corrections given the number of invariant sites and the frequencies of nucleotides in the alignment. If SNP-sites output these values, they could be used as direct inputs for RAxML, for example.

EpiDemos82 avatar Sep 23 '17 01:09 EpiDemos82

I second this suggestion.

Either add a -s (stats?) option to report all sorts of columnar statistics, characters used etc.

OR

Always output this to stderr as part of the logs.

tseemann avatar Sep 27 '17 07:09 tseemann

Grand, give me a toy example and I'll sort it out

On 27 September 2017 at 08:53, Torsten Seemann [email protected] wrote:

I second this suggestion.

Either add a -s (stats?) option to report all sorts of columnar statistics, characters used etc.

OR

Always output this to stderr as part of the logs.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sanger-pathogens/snp-sites/issues/62#issuecomment-332439973, or mute the thread https://github.com/notifications/unsubscribe-auth/AABeVy1rlA4xOP4bxeVn-EM0LNPbtoOhks5smf7igaJpZM4PhZS0 .

andrewjpage avatar Sep 27 '17 07:09 andrewjpage

From your example in the README:

sample1 AGACACAGTCAC sample2 AGACAC----AC sample3 AAACGCATTCAN

-s (or stderr) would produce:

Input stats: Alignment length: 12 Proportion Ns: 0.03 Proportion Gap sites: 0.11 Nucleotide frequencies (A,G,C,T): 45.2,12.9,32.3,9.7

Output stats: SNP alignment length: 3 Number Gap sites (-) introduced: 1 Proportion gap sites: 0.11

Thinking about this more, I obviously came up with a couple other useful stats. seqtk comp can produce similar stats, but having them in one tool with the speed of SNP-sites would be great.

EpiDemos82 avatar Sep 27 '17 14:09 EpiDemos82

But output in a machine readable format so we can parse or JSON-ate.

tseemann avatar Oct 04 '17 05:10 tseemann

@andrewjpage

Hello. I was curious if this was ever implemented?

slvrshot avatar Apr 01 '21 19:04 slvrshot