nextclade icon indicating copy to clipboard operation
nextclade copied to clipboard

ENH: Add coverage to Nextclade output?

Open tseemann opened this issue 1 year ago • 2 comments

Coverage is normally defined as number of valid called bases divided by length of the virus.

Does Nextclade have a column for this?

The only way I can seem to compute it is to use

  • totalMissing
  • totalNonACGTNs
  • alignmentStart
  • alignmentEnd along with the reference length.

But this seems error-prone and messy.

Is there an easier way?

tseemann avatar Jul 12 '22 00:07 tseemann

Good point, we don't have a column for this (yet).

Given that coverage is a metric that may be of interest and involves 5 numbers we may want to add that to the tsv output.

I'll turn this into an enhancement proposal.

corneliusroemer avatar Jul 19 '22 16:07 corneliusroemer

So right now we count:

  • Ns
  • ambiguous nucleotides
  • inserted nucleotides
  • deleted nucleotides

It would make sense to also count:

  • sequenced nucleotides minus inserted nts plus deleted nucleotides

So that we know the total number of aligned bases. Coverage could then be calculated as that number divided by reference length.

corneliusroemer avatar Aug 08 '22 20:08 corneliusroemer