augur
augur copied to clipboard
Validate annotations produced from ancestral + translate
I've encountered a bug that took me very long to figure out. Augur export reported the following error:
Validating schema of 'auspice/monkeypox_global.json'...
ERROR: 'nuc' is a required property. Trace: properties - meta - properties - genome_annotations - required
Validation of 'auspice/monkeypox_global.json' failed.
------------------------
Validation of auspice/monkeypox_global.json failed. Please check this in a local instance of `auspice`, as it is not expected to display correctly.
------------------------
Now it turns out, that export requires nuc
annotations, and these come in usually through aa_mut.json
from augur translate
.
I was reading in annotations from a .gff
into translate, something that's theoretically supported. However, it's actually not possible to read in nuc
annotation in the current implementation.
It would have very much sped up debugging if augur translate
had warned me (or even errored) when it realised that it was lacking nuc
annotations.
I'd propose an error if nuc
not output into aa_mut.json
:
[Error] Could not read in `nuc` annotations. Please check the annotation in your input file. For `.gff` the line needs to look like this:
MT903344.1 Genbank source 1 197233 . + . locus_tag=nuc
Related to #881
I think this issue arose as part of this Slack conversation. @corneliusroemer, am I correct in this?
(1 year later...)
The annotations schema now requires 'nuc' to be present (d6246ca052478446f7179e230e842a34f93e4cd4) however neither augur ancestral
nor augur translate
validate their outputs. Reading any node-data file (via NodeDataReader
) with an "annotations" block will also validate against the schema, although in this case that's still going to be first encountered in augur export v2
.
Conceptually we could have the annotations from ancestral
define 'nuc' and translate
define the CDSs, and they'll be merged in augur export
, however I think it's sensible to require translate
to add a 'nuc' block, which is why I made it a required property. If augur export
sees multiple annotations.nuc
entries it should really ensure they are the same length! (The JSON merging happens within NodeDataReader
)
Just a note, I ran into this issue working on my PRRSV dataset (https://github.com/mazeller/NextClade_Datasets/tree/main/prrsv_yimim_v3). I needed to append the following line to my GFF manually.
DQ478308.1 Genbank source 1 603 . + . locus_tag=nuc
however I think it's sensible to require translate to add a 'nuc' block, which is why I made it a required property
As of 1d17699e960d3805a0a586d7ccf3e9a550d53ac9 (in master, but not yet released) augur translate
will always export this. (I missed this issue when scanning, it's very similar to #953.)
Just a note, I ran into this issue working on my PRRSV dataset (https://github.com/mazeller/NextClade_Datasets/tree/main/prrsv_yimim_v3). I needed to append the following line to my GFF manually.
P.S. recent augur PRs (merged but not released) will fix this, we'll now read the nuc coords from the sequence-region pragma in your GFF ("##sequence-region DQ478308.1 1 603").