nextclade icon indicating copy to clipboard operation
nextclade copied to clipboard

Inconsistency in how Ns are handled for genes vs nucleotide view

Open corneliusroemer opened this issue 2 years ago • 3 comments

When I upload a partial genome, it is not evident in gene view that there are missing parts. It looks like the sequence is equal to reference in parts that are cut out.

image

The same is true for the column Ns or missing. It doesn't count missing parts of the genome.

In nucleotide view, however, unsequenced parts are marked up and distinguishable (grey, with tooltip).

image

It may be good to surface missing beginnings and ends in gene view and in the Ns column tooltip. I think we have the information, it should be available through alignment start/end. It could be displayed in the same tooltip but under a separate heading (as missing start/end) and maybe counted in parentheses (like we display known frame shifts).

corneliusroemer avatar Feb 14 '22 17:02 corneliusroemer

To reproduce, you can use this sequence short.txt

Or this one, maybe better since it contains some S mutations short.txt

corneliusroemer avatar Feb 14 '22 18:02 corneliusroemer

missing beginnings and ends

They are mentioned in the alignment start and end. However it is unclear how to map that to aminoacids.

ivan-aksamentov avatar Mar 12 '22 20:03 ivan-aksamentov

Spotted someone raising this issue independently in the wild. Would be great if we could display alignmentStart/End on the gene view https://github.com/cov-lineages/pango-designation/issues/843#issuecomment-1196584005

corneliusroemer avatar Jul 27 '22 15:07 corneliusroemer