Apollo icon indicating copy to clipboard operation
Apollo copied to clipboard

incorporate frameshifts

Open nathandunn opened this issue 9 years ago • 20 comments

====

  • (mostly . . . allow exons to overlap), just for specific isoforms ====

as @cmdcolin noted, we have a bunch of code for frameshifts and they are in the code, but we do not appear to actually be able to add them.

Something for us to discuss at some point.

  • [ ] add frameshift object via drop-down (-1,+1, -2?)
  • [ ] indicate visually
  • [ ] translate correctly
  • [ ] export in GFF3 and FASTA
  • [ ] proper calculation of CDS

===

Output both annotations (original and pre-frameshifted).

nathandunn avatar Jul 16 '15 20:07 nathandunn

@monicacecilia Please comment and then assign to me with recommendations when you are testing.

nathandunn avatar Aug 07 '15 16:08 nathandunn

It's all coming back.

Desktop Apollo had a function that allowed curators to shift the frame of translation +1 or -1 from the base pair where the cursor stood.

This is what it looked like: screen shot 2016-01-21 at 5 46 36 pm

In some organisms, cells naturally shift the frame of translation to express a gene (the ribosome skips, basically). This was common in some Drosophila genes and the request was made way back when. For an example see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC108870/

This code should be re-implemented, but this is not of the highest priority at this moment. I'm punting this down to the time after coordinate transformation and variant annotation are implemented and working as desired.

monicacecilia avatar Jan 22 '16 01:01 monicacecilia

:+1:

nathandunn avatar Jan 22 '16 02:01 nathandunn

@selewis & @nathandunn: It will be very useful to come back to this ticket and work the implementation of this functionality in the near future.

monicacecilia avatar Aug 17 '17 21:08 monicacecilia

Very common in phages, but sometimes the frameshifts are more than just ±1, e.g. http://www.sciencedirect.com/science/article/pii/S1097276504005398

Incredibly important to CPT's use case I believe. cc @moffmade

The lack of support is a bit of a complex issue, since JBrowse will not render to-spec gff3 including frameshifts. xref https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md you'll have to ctrl-f for "programmed frameshift".

hexylena avatar Aug 18 '17 09:08 hexylena

What is the status of resolving this issue?

jimhu-tamu avatar Dec 21 '17 18:12 jimhu-tamu

I think we've deferred due to our time constraints. However, if this is something you'd be interested in implementing, we'd be more than happy to work with you on it. Also, we are doing a hackathon in January if that would be convenient.

nathandunn avatar Dec 21 '17 18:12 nathandunn

I'm asking based on the class that @erasche was referring to, which we will start teaching again in January. @moffmade is now working with us on continuing Eric's work, and the timing is bad for us to attend. But I just sent him the link to look at the agenda. We have an even more critical Apollo problem that he will add an issue here for soon.

jimhu-tamu avatar Dec 21 '17 19:12 jimhu-tamu

@moffmade is welcome to join us remotely, as well, but that will be busy time for teaching. Yeah, let us know about the critical problems and your timeline for teaching. Our hope is that we can possibly get @moffmade doing a few of these fixes himself after getting somewhat familiar with the stack, if he has time.

On Dec 21, 2017, at 11:22 AM, Jim Hu [email protected] wrote:

I'm asking based on the class that @erasche https://github.com/erasche was referring to, which we will start teaching again in January. @moffmad is now working with us on continuing Eric's work, and the timing is bad for us to attend. But I just sent him the link to look at the agenda. We have an even more critical Apollo problem that he will add an issue here for soon.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GMOD/Apollo/issues/483#issuecomment-353435648, or mute the thread https://github.com/notifications/unsubscribe-auth/AAt2qjub4Uhbmq1Qh2AbODVU4Dx868GCks5tCq_ugaJpZM4FaLoa.

nathandunn avatar Dec 21 '17 19:12 nathandunn

  • just fyi y'all, it's @moffmade (looks like moffmad isn't a user on github so no one is getting those pings)
  • It was far outside my ability to implement such a feature since it requires changes across jbrowse and apollo. @moffmade is a great developer (way more competent than me) but even at my best after a while working with apollo it wasn't something I was able to do, and he's much newer to jbrowse/apollo world. I'd say essentially that CPT needs apollo developer help on this one if it could be done in time for the class.

hexylena avatar Dec 21 '17 21:12 hexylena

oops. Updated my reply above for Corey's correct id.

jimhu-tamu avatar Dec 21 '17 21:12 jimhu-tamu

@jimhu-tamu / @MoffMade , @erasche assessment is probably correct. I'll be available to do a remote call on the 4th if its something you might be interested in pursuing. However, I would estimate 2-4 weeks even with our help if I remember this issue correctly.

Maybe @erasche can make some introductions off-line. We can make arrangements over the break (and am happy to point folks to resources).

nathandunn avatar Dec 22 '17 06:12 nathandunn

Offline introduction? I'm physically unavailable until february (holiday.)

hexylena avatar Dec 22 '17 14:12 hexylena

Sorry i meant off of GitHub via email. No need for travel! I’ll wait until we see you at the galaxy conference to see you in person.

Nathan

On Dec 22, 2017, at 6:44 AM, Eric Rasche [email protected] wrote:

Offline introduction? I'm physically unavailable until february (holiday.)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

nathandunn avatar Dec 22 '17 17:12 nathandunn

Back at work, sure, available on the 4th if you need a videoconf or something for more detailed explanation.

hexylena avatar Feb 02 '18 09:02 hexylena

https://github.com/TAMU-CPT/training-material/blob/bich464/topics/genome-annotation/tutorials/annotating-tmp-chaperone-frameshifts/tutorial.md

nathandunn avatar Nov 02 '18 19:11 nathandunn

From notes:

  • Tutorial we have written to help our students get through frameshift annotation https://github.com/TAMU-CPT/training-material/blob/bich464/topics/genome-annotation/tutorials/annotating-tmp-chaperone-frameshifts/tutorial.md
  • Some examples in NCBI: https://www.ncbi.nlm.nih.gov/nuccore/1428093527 (GenBank: MH321492.1: /locus_tag="Lorac_015" is the frameshift protein, compared to /locus_tag="Lorac_014")
  • Though previously used in desktop version, it will require some retooling to get implemented with all -1 and others (-2, +1) in web A
  • Example frameshift in E. coli K-12 https://www.ncbi.nlm.nih.gov/nuccore/NC_000913.3 Search for dnaX. Two products, tau and gamma

nathandunn avatar Dec 05 '18 21:12 nathandunn

Treat similarly to a read through stop codon, but base specific

nathandunn avatar Apr 22 '19 22:04 nathandunn

Per discussions with TAMU group @meiliucpt will add some export examples.

  • [ ] In general the frameshift is tied to the mRNA (which affects translation, will be visually indicated etc.)
  • [ ] frameshift will be export as part of the mRNA in GFF3, etc. (awaiting what that might look like)
  • [ ] frameshifted mRNA's will be part of a separate gene

nathandunn avatar Apr 26 '19 20:04 nathandunn

Using NCBI: https://www.ncbi.nlm.nih.gov/nuccore/1428093527 as an example (GenBank: MH321492.1: /locus_tag="Lorac_015" is the frameshifted protein, and /locus_tag="Lorac_014" reads through the slippery sequence to the ORF's normal stop codon),

GenBank record for the frameshift protein and its non-shifted version should look like this:

image

The converted gff3 (converted using our GenBank - GFF3 converter which is from BioPerl) looks like this:

image

The frameshifted and "normal" reading frames are represented as 2 separate genes.

In the frameshifted feature (Lorac_15), the GFF3 has the gene (Shine-Dalgarno + CDS) as parent, with the mRNA (1st base of CDS to last base of CDS) and Shine-Dalgarno as children. Under the mRNA are 2 CDS and 2 exon features. We're not sure how the frameshift is represented, are the 2 CDSs or 2 exons automatically merged into a single protein sequence when read?

Based on what we see, it looks like we've been representing frameshifts as basically 2 exons which then get merged (i.e., like 2 exons separated by an intron that is -1 bp in length), which is derived from how these are represented in GenBank. If we switched to representing these as an mRNA with a frameshift in it, that could be done but would be a departure from the current process and we'd need to make sure we had a way to place these features in Apollo and export them again in a way that GenBank can handle. I hope this explanation makes sense. Let me know if you have questions.

meiliuCPT avatar Apr 27 '19 22:04 meiliuCPT