extract-transcript-regions icon indicating copy to clipboard operation
extract-transcript-regions copied to clipboard

[Suggestion] Option for index/offset and full mRNA transcripts

Open vizkidd opened this issue 3 years ago • 2 comments

I love the scripts but had to tweak them a bit because of issue with GTF and genome index. I had to calculate the difference between the sequences I had with the ones from ensembl for finding the number of bases that were offset. It would be nice if I could just pass the offset as a param to the script instead of hardcoding it.

Second is, can you include an option to scrape out full-length transcripts as well?

Thanks!

vizkidd avatar Sep 01 '20 11:09 vizkidd

Thanks for raising this issue and glad the scripts are helpful for you. I'm not sure I'm understanding how best to address the GTF/genome mismatch - could you please link an example?

To your second question, the "exons" files should contain coordinates for full-length transcripts (all exons). Does this work?

stephenfloor avatar Sep 23 '20 04:09 stephenfloor

Hi there,

I am using the scripts provided to extract some regulatory sequence of genes using genome file, while I received multiple sequences for one gene, and not sure why this happened, the issue is as below:

NC_048263.1:49619-51244 gcattttgctagtaactatatatttgtataaatattgtTATTGTATAACTCTATACCGTTTATCTAGCGCTAGATTTTAATTGATCTGACAAAACAATGATATATAAATATTATGTGTTGCACATAACACATAGAAATACCGAGAAAGCTAGGATCTTTTTCAAGATTTAGCAAAGGTCCGGATGGAGCGTAGAAGGCAGGGAGGCGGAGTTGTTTAGGGGCGGTGGCACAAGCTGCAGGTCCGTGGCGGCTGCTGGAGCTTTGTCGTGGTGGCGTCGATTCGGCGGTGGCTGCAGCTTGATTCACATCGACAGTGCTCGATTTCACGGCGACAGCAAAGTCAGCacctttctcctcttcttcagGTCGCGGCAGCACACAAGCACCATCAAATCCCTCGGCATCCAGCAACCAAATCAGCAACTACTCGCCGCTGCGTTGTGCTCCAGTGCTCGTCGCCTCGTCGGCCTGGTAGTAGGCAACAGAAGCATGGCAGCCCGGCAGTTGAGCGCCTCGTGCAGCAGAGCGGAGGCGTGCTGCTGTCGTCGGCACACCCGTAGTCTGCACCGAGGCGTGCAGTGGCACATCGTCGCCTCGAGGCCGGTGTAGATGGCCTGGCCAGGGCCGGTTCTGGGGCTAGGCCAGAGGGGCGACGGCCGGGGGCCCAAGCCAGGAGGGGGCCCAGGAGTTTATACACAAATGAAGTAGGTGACTAGCAGGCTAAAGTGAAGTCTAATGATTTAAACATTGGTTCTTAGGACTTGTTCAGTAATTGATGCACAATTTATGTCGCTAGTCACTAGGAAATGATTAGCTCACCAGGCGTGCCTTCACACATTGAGTCCCTCAATGGCTCAATCCATTGCTTATTAGTTATTTACTTAAAGCATGCTAAATTGTAGTTCTAAACTTCTAATTCCAAGTGTTACTTTCTTTCAAAATTTAGGTGTTACAATATTTCGCTCCTGTCATTGAACGTTTGGCCCACCTGTCATTGAGACAAAACGCACCAAAGTTAACTTTACTACAGAAACAGTCCTCCTTTTACTATAGGGAGAGATGCCTGAGGGTCGAACGGACAGAGCAGATCAGCCACCAACACACATCCTACGCGCACAAAATCAGATGACGGAGAGCCTCGAACTACACGGGTGGGAGTATAATGCACTCAATCCATCATCTCCGGCTACCGTATGTATGTACACCTTCAAGTAACTAATCAATCCCATCTCTGCGTGATGGTCGGTCGGCCCAGCTCCATCCAAGGGCACATCATCATTCGTGGGTGCATTTCTGGGCCGGGCCGTCCCAACAGAATGAAGCAGGGCCCAACCCATCGGGTGAGGCTGACACTGCCTCTCCACTTTTCGAAATGGTTGGTTGGTGCTGCAACGTGCAACCGGAATCCACCGTTGCAACCACCAGTCACGGTCAAGACTGTCAGACGAGCAAGTGAGCAAAGCATGCGCTCCAGTTAGCTGCAGCAACTCCGGTCTCTGTCTCTGTGAATTCAATATAAATTCGCTCCTAGTGGTGGTGGCCATCCATCGATCGATCTCAGCAATACCAGCAAGCAGCAATAATCTGAAACAAACCTATATATCAGTCCGTGCGTCTGCtgatcg NC_048263.1:54909-55138 CCGGATGGGGAACGGATTGGGTTCCTGGAGTAGCTCGAATGTGAATTGGAGGCTCGATTCGGAGATCCCAAGCTGCCCGGGGACCAAGCGGCGGGGTTTGACCGGGGAAGCCTAGGGTTTTGACCGCCGATCTGGTCGGGAGAGGCCAGAGGTGGGGTGTGGGGGAACGAACGGCCGCGGcacgaggaagaagacgaagaggTAGCCGTTGGGCTGGGTTTGCTCCGCT NC_048263.1:92024-92121 CTtactcctccctccttctcttccgGGCTCTGCTGCGTCTGCGTCTGCGTCTGCGTCTGCGTGGgcaccactccactccactccagctccagctcca NC_048263.1:95962-95969 cTATGCT

Could you please help explain the reason or provide any approaches to clean the sequences? Thanks a lot!

Best regards, Chao

alexwu66666 avatar Oct 04 '21 07:10 alexwu66666

stale

stephenfloor avatar Apr 21 '23 04:04 stephenfloor