nextclade
nextclade copied to clipboard
Unintuitive: Truncated peptides have deletions instead of `X` in alignment
When a gene starts or ends with Ns, a user might expect that that section of the gene is output as X
instead of -
as we actually do (see below)
This seems to be the root cause of a bug in covSpectrum: https://github.com/cevo-public/cov-spectrum-website/issues/398
It's possible of course for users to convert starts and ends with -
into X
but I feel like this should be done by Nextalign.
We seem to call the truncated parts as deletions. Is that on purpose? It would feel more appropriate to call them
X
I feel. It's much more likely that a partial gene was uploaded than that this is a real deletion.
Originally posted by @corneliusroemer in https://github.com/nextstrain/nextclade/issues/731#issuecomment-1039339677
If we decide not to change the output, we may want to make this clearer in the docs.
This issue is related to #730 but with a focus on the file output rather than the web view. Switching from -
to X
would automatically solve #730 I think
Hi. I was just wondering how you are prioritizing this issue. Do you plan to fix it any time soon?
These are the huge deletions relative to the reference, and this is how it comes out of alignment. Same thing for nucs in #730.
I am not sure why you guys decided it should be N nucs or X aminoacids. Have you just invented that randomly, or do you know examples of tools, or any community agreement that incomplete fragments should be handled this way? What comes out of mafft and other tools if you feed them your examples?
@chaoran-chen I don't think we'll fix this soon within Nextclade, it's not a simple error that can be fixed in a few lines of code unfortunately.