augur
augur copied to clipboard
fix: translate `-` strand annotation
Annotations output by augur translate always contained +
irrespective
of what the input gff or gb file specified and biopython returned
This bug was down to the assumption that biopython feat.location.strand
returns a boolean, when it in fact returns numeric 1
or -1
for pos/neg strand
In order to be backwards compatible and output +
by default, for example when no strand directionality is given, I reversed the test, so it puts -
iff strand is -1
, otherwise it's +
Testing
The change is minimal, it's a very localized bug fix.
There is almost no testing for augur translate
, so this would be good to set up at some point, but it shouldn't block this important fix (it's directly relevant for all MPX builds). Maybe @victorlin can add this to the backlog of Augur work?
This makes Auspice display pos and neg strands correctly as shown below:
there is something wrong with the tests though...
Sorry for coming in, just want to ask, does augur translate
currently assumes that all CDS in the genbank file are forward strand?
I am working on orf virus, and some of the genes are actually in reverse strand, and I do notice that the amino acids are different that what is present in the genbank translations.
Sorry for coming in, just want to ask, does augur translate currently assumes that all CDS in the genbank file are forward strand?
Hey @ZarulHanifah -- this is purely a visualisation fix (whether the gene was displayed above/below the line in auspice), it doesn't change the actual AA translations at all.
Ouch, let's get this functionality fixed! (The failing test logs have expired, so I can't see what's wrong there.)
I am confused as to how we do this correctly for TB 👇 as running the (VCF) example in ./tests/builds/tb
has feat.location.strand
of -1
or 1
(and thus all are exported as being on the positive strand).
I am confused as to how we do this correctly for TB point_down
We did, apparently, but don't any more. That TB build hasn't been updated since 2018, and it contains both -1 and +1 strand values:
$ curl -fsSL --compressed https://data.nextstrain.org/tb_global_meta.json | jq -c '.annotations | to_entries | .[] | {key, strand: .value.strand}'
{"key":"gyrB","strand":1}
{"key":"gyrA","strand":1}
{"key":"Rv0010c","strand":-1}
{"key":"Rv0011c","strand":-1}
{"key":"Rv0026","strand":1}
{"key":"Rv0039c","strand":-1}
{"key":"ponA1","strand":1}
{"key":"Rv0147","strand":1}
{"key":"mce1R","strand":-1}
{"key":"lprO","strand":-1}
…
Looks like 74125f5cf9fdc1b2b9e20687be83c5d11ec3e580 is what broke that, first released in Augur 6.0.0, when we moved from +1
/-1
in v1 JSONs to +
/-
in v2.