extract-transcript-regions
extract-transcript-regions copied to clipboard
[Suggestion] Option for index/offset and full mRNA transcripts
I love the scripts but had to tweak them a bit because of issue with GTF and genome index. I had to calculate the difference between the sequences I had with the ones from ensembl for finding the number of bases that were offset. It would be nice if I could just pass the offset as a param to the script instead of hardcoding it.
Second is, can you include an option to scrape out full-length transcripts as well?
Thanks!
Thanks for raising this issue and glad the scripts are helpful for you. I'm not sure I'm understanding how best to address the GTF/genome mismatch - could you please link an example?
To your second question, the "exons" files should contain coordinates for full-length transcripts (all exons). Does this work?
Hi there,
I am using the scripts provided to extract some regulatory sequence of genes using genome file, while I received multiple sequences for one gene, and not sure why this happened, the issue is as below:
NC_048263.1:49619-51244 gcattttgctagtaactatatatttgtataaatattgtTATTGTATAACTCTATACCGTTTATCTAGCGCTAGATTTTAATTGATCTGACAAAACAATGATATATAAATATTATGTGTTGCACATAACACATAGAAATACCGAGAAAGCTAGGATCTTTTTCAAGATTTAGCAAAGGTCCGGATGGAGCGTAGAAGGCAGGGAGGCGGAGTTGTTTAGGGGCGGTGGCACAAGCTGCAGGTCCGTGGCGGCTGCTGGAGCTTTGTCGTGGTGGCGTCGATTCGGCGGTGGCTGCAGCTTGATTCACATCGACAGTGCTCGATTTCACGGCGACAGCAAAGTCAGCacctttctcctcttcttcagGTCGCGGCAGCACACAAGCACCATCAAATCCCTCGGCATCCAGCAACCAAATCAGCAACTACTCGCCGCTGCGTTGTGCTCCAGTGCTCGTCGCCTCGTCGGCCTGGTAGTAGGCAACAGAAGCATGGCAGCCCGGCAGTTGAGCGCCTCGTGCAGCAGAGCGGAGGCGTGCTGCTGTCGTCGGCACACCCGTAGTCTGCACCGAGGCGTGCAGTGGCACATCGTCGCCTCGAGGCCGGTGTAGATGGCCTGGCCAGGGCCGGTTCTGGGGCTAGGCCAGAGGGGCGACGGCCGGGGGCCCAAGCCAGGAGGGGGCCCAGGAGTTTATACACAAATGAAGTAGGTGACTAGCAGGCTAAAGTGAAGTCTAATGATTTAAACATTGGTTCTTAGGACTTGTTCAGTAATTGATGCACAATTTATGTCGCTAGTCACTAGGAAATGATTAGCTCACCAGGCGTGCCTTCACACATTGAGTCCCTCAATGGCTCAATCCATTGCTTATTAGTTATTTACTTAAAGCATGCTAAATTGTAGTTCTAAACTTCTAATTCCAAGTGTTACTTTCTTTCAAAATTTAGGTGTTACAATATTTCGCTCCTGTCATTGAACGTTTGGCCCACCTGTCATTGAGACAAAACGCACCAAAGTTAACTTTACTACAGAAACAGTCCTCCTTTTACTATAGGGAGAGATGCCTGAGGGTCGAACGGACAGAGCAGATCAGCCACCAACACACATCCTACGCGCACAAAATCAGATGACGGAGAGCCTCGAACTACACGGGTGGGAGTATAATGCACTCAATCCATCATCTCCGGCTACCGTATGTATGTACACCTTCAAGTAACTAATCAATCCCATCTCTGCGTGATGGTCGGTCGGCCCAGCTCCATCCAAGGGCACATCATCATTCGTGGGTGCATTTCTGGGCCGGGCCGTCCCAACAGAATGAAGCAGGGCCCAACCCATCGGGTGAGGCTGACACTGCCTCTCCACTTTTCGAAATGGTTGGTTGGTGCTGCAACGTGCAACCGGAATCCACCGTTGCAACCACCAGTCACGGTCAAGACTGTCAGACGAGCAAGTGAGCAAAGCATGCGCTCCAGTTAGCTGCAGCAACTCCGGTCTCTGTCTCTGTGAATTCAATATAAATTCGCTCCTAGTGGTGGTGGCCATCCATCGATCGATCTCAGCAATACCAGCAAGCAGCAATAATCTGAAACAAACCTATATATCAGTCCGTGCGTCTGCtgatcg NC_048263.1:54909-55138 CCGGATGGGGAACGGATTGGGTTCCTGGAGTAGCTCGAATGTGAATTGGAGGCTCGATTCGGAGATCCCAAGCTGCCCGGGGACCAAGCGGCGGGGTTTGACCGGGGAAGCCTAGGGTTTTGACCGCCGATCTGGTCGGGAGAGGCCAGAGGTGGGGTGTGGGGGAACGAACGGCCGCGGcacgaggaagaagacgaagaggTAGCCGTTGGGCTGGGTTTGCTCCGCT NC_048263.1:92024-92121 CTtactcctccctccttctcttccgGGCTCTGCTGCGTCTGCGTCTGCGTCTGCGTCTGCGTGGgcaccactccactccactccagctccagctcca NC_048263.1:95962-95969 cTATGCT
Could you please help explain the reason or provide any approaches to clean the sequences? Thanks a lot!
Best regards, Chao
stale