auspice
auspice copied to clipboard
ORF1ab is listed in JSON genome_annotations, but not in dropdown for Color by: Genotype
Hi! This might be an Auspice thing, but since I'm using the nextstrain.org/fetch/ function I'll file it here.
Current Behavior
When I view https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/whereIsOrf1AB.json and select Color by: Genotype, the gene menu does not include ORF1ab even though it is the first item in the JSON's genome_annotations
list:
... "genome_annotations": { "ORF1ab": { "start": 266, "end": 21555, "strand": "+", "type": "CDS"} , "S": { "start": 21563, "end": 25384, "strand": "+", "type": "CDS"} , "ORF3a": { ...
The Color by: Genotype gene menu lists nucleotide, S, ORF3a, ...:
Expected behavior
I would expect the Color by: Genotype gene menu to list nucleotide, ORF1ab, S, ORF3a, ....
How to reproduce
- Go to https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/whereIsOrf1AB.json
- Select Color by: Genotype
- Try to choose ORF1ab from the gene menu... it's missing.
Possible solution
Is 'nucleotide' perhaps replacing the first element instead of being prepended to the list??
Your environment: if browsing Nextstrain online
- Mac OS X 10.15.7
- Browser: Chrome 114.0.5735.198
Additional context
HT @FedeGueli
Hi @AngieHinrichs,
When I open the console on the page, I see the following error coming from Auspice:
[Genome annotation] ORF1ab has length 21290 which is not a multiple of 3
With the latest Auspice updates made by @jameshadfield, you should be able to define the two segments separately in the genome annotations:
"ORF1ab": {
"strand": "+",
"segments":[
{"start": 266, "end": 13468},
{"start": 13468, "end": 21555}
]
},
It would be helpful to make this error more obvious to users by dispatching a warning or error notification.
Hey @AngieHinrichs - @joverlee521's summarisd things perfectly but note that the segmented annotations can't yet be produced by the augur tools so you'll have to add them via a short python script. Here's an example of a python script I used in testing to manipulate the ncov JSONs to produce segmented annotations for the 2 CDSs which cover the slip site (RdRp and ORF1ab). Internally we debated changing all our ncov datasets from separate ORF1a + ORF1b CDSs to the more correct ORF1ab CDS, but I don't think we will do this as so many people (and pango designations) are using ORF1b numbering; we will probably add the 16 proteins cleaved from the polyproteins tho.
Ah, thanks @joverlee521 and @jameshadfield! I wish I'd thought to check the console. OK, I will update the ORF1ab coords in the JSON to list the segments. My code has been adding ORF1ab mutation annotations to the nodes so it didn't occur to me that the ORF1ab coords would really matter for anything besides drawing the genes down below. 🙂