augur icon indicating copy to clipboard operation
augur copied to clipboard

fix: translate `-` strand annotation

Open corneliusroemer opened this issue 2 years ago • 6 comments

Annotations output by augur translate always contained + irrespective of what the input gff or gb file specified and biopython returned

This bug was down to the assumption that biopython feat.location.strand returns a boolean, when it in fact returns numeric 1 or -1 for pos/neg strand

In order to be backwards compatible and output + by default, for example when no strand directionality is given, I reversed the test, so it puts - iff strand is -1, otherwise it's +

Testing

The change is minimal, it's a very localized bug fix.

There is almost no testing for augur translate, so this would be good to set up at some point, but it shouldn't block this important fix (it's directly relevant for all MPX builds). Maybe @victorlin can add this to the backlog of Augur work?

This makes Auspice display pos and neg strands correctly as shown below: image

corneliusroemer avatar Jun 07 '22 16:06 corneliusroemer

there is something wrong with the tests though...

rneher avatar Jun 07 '22 16:06 rneher

Sorry for coming in, just want to ask, does augur translate currently assumes that all CDS in the genbank file are forward strand?

I am working on orf virus, and some of the genes are actually in reverse strand, and I do notice that the amino acids are different that what is present in the genbank translations.

ZarulHanifah avatar Jul 29 '22 19:07 ZarulHanifah

Sorry for coming in, just want to ask, does augur translate currently assumes that all CDS in the genbank file are forward strand?

Hey @ZarulHanifah -- this is purely a visualisation fix (whether the gene was displayed above/below the line in auspice), it doesn't change the actual AA translations at all.

jameshadfield avatar Sep 14 '22 03:09 jameshadfield

Ouch, let's get this functionality fixed! (The failing test logs have expired, so I can't see what's wrong there.)

I am confused as to how we do this correctly for TB 👇 as running the (VCF) example in ./tests/builds/tb has feat.location.strand of -1 or 1 (and thus all are exported as being on the positive strand).

image

jameshadfield avatar Sep 14 '22 03:09 jameshadfield

I am confused as to how we do this correctly for TB point_down

We did, apparently, but don't any more. That TB build hasn't been updated since 2018, and it contains both -1 and +1 strand values:

$ curl -fsSL --compressed https://data.nextstrain.org/tb_global_meta.json | jq -c '.annotations | to_entries | .[] | {key, strand: .value.strand}'
{"key":"gyrB","strand":1}
{"key":"gyrA","strand":1}
{"key":"Rv0010c","strand":-1}
{"key":"Rv0011c","strand":-1}
{"key":"Rv0026","strand":1}
{"key":"Rv0039c","strand":-1}
{"key":"ponA1","strand":1}
{"key":"Rv0147","strand":1}
{"key":"mce1R","strand":-1}
{"key":"lprO","strand":-1}
…

tsibley avatar Sep 15 '22 21:09 tsibley

Looks like 74125f5cf9fdc1b2b9e20687be83c5d11ec3e580 is what broke that, first released in Augur 6.0.0, when we moved from +1/-1 in v1 JSONs to +/- in v2.

tsibley avatar Sep 15 '22 21:09 tsibley