extract-transcript-regions
extract-transcript-regions copied to clipboard
multiple sequence extracted for one gene
Hi there,
I am using the scripts provided to extract some regulatory sequence of genes using genome file, while I received multiple sequences for one gene, and not sure why this happened, the issue is as below:
NC_048263.1:49619-51244 gcattttgctagtaactatatatttgtataaatattgtTATTGTATAACTCTATACCGTTTATCTAGCGCTAGATTTTAATTGATCTGACAAAACAATGATATATAAATATTATGTGTTGCACATAACACATAGAAATACCGAGAAAGCTAGGATCTTTTTCAAGATTTAGCAAAGGTCCGGATGGAGCGTAGAAGGCAGGGAGGCGGAGTTGTTTAGGGGCGGTGGCACAAGCTGCAGGTCCGTGGCGGCTGCTGGAGCTTTGTCGTGGTGGCGTCGATTCGGCGGTGGCTGCAGCTTGATTCACATCGACAGTGCTCGATTTCACGGCGACAGCAAAGTCAGCacctttctcctcttcttcagGTCGCGGCAGCACACAAGCACCATCAAATCCCTCGGCATCCAGCAACCAAATCAGCAACTACTCGCCGCTGCGTTGTGCTCCAGTGCTCGTCGCCTCGTCGGCCTGGTAGTAGGCAACAGAAGCATGGCAGCCCGGCAGTTGAGCGCCTCGTGCAGCAGAGCGGAGGCGTGCTGCTGTCGTCGGCACACCCGTAGTCTGCACCGAGGCGTGCAGTGGCACATCGTCGCCTCGAGGCCGGTGTAGATGGCCTGGCCAGGGCCGGTTCTGGGGCTAGGCCAGAGGGGCGACGGCCGGGGGCCCAAGCCAGGAGGGGGCCCAGGAGTTTATACACAAATGAAGTAGGTGACTAGCAGGCTAAAGTGAAGTCTAATGATTTAAACATTGGTTCTTAGGACTTGTTCAGTAATTGATGCACAATTTATGTCGCTAGTCACTAGGAAATGATTAGCTCACCAGGCGTGCCTTCACACATTGAGTCCCTCAATGGCTCAATCCATTGCTTATTAGTTATTTACTTAAAGCATGCTAAATTGTAGTTCTAAACTTCTAATTCCAAGTGTTACTTTCTTTCAAAATTTAGGTGTTACAATATTTCGCTCCTGTCATTGAACGTTTGGCCCACCTGTCATTGAGACAAAACGCACCAAAGTTAACTTTACTACAGAAACAGTCCTCCTTTTACTATAGGGAGAGATGCCTGAGGGTCGAACGGACAGAGCAGATCAGCCACCAACACACATCCTACGCGCACAAAATCAGATGACGGAGAGCCTCGAACTACACGGGTGGGAGTATAATGCACTCAATCCATCATCTCCGGCTACCGTATGTATGTACACCTTCAAGTAACTAATCAATCCCATCTCTGCGTGATGGTCGGTCGGCCCAGCTCCATCCAAGGGCACATCATCATTCGTGGGTGCATTTCTGGGCCGGGCCGTCCCAACAGAATGAAGCAGGGCCCAACCCATCGGGTGAGGCTGACACTGCCTCTCCACTTTTCGAAATGGTTGGTTGGTGCTGCAACGTGCAACCGGAATCCACCGTTGCAACCACCAGTCACGGTCAAGACTGTCAGACGAGCAAGTGAGCAAAGCATGCGCTCCAGTTAGCTGCAGCAACTCCGGTCTCTGTCTCTGTGAATTCAATATAAATTCGCTCCTAGTGGTGGTGGCCATCCATCGATCGATCTCAGCAATACCAGCAAGCAGCAATAATCTGAAACAAACCTATATATCAGTCCGTGCGTCTGCtgatcg NC_048263.1:54909-55138 CCGGATGGGGAACGGATTGGGTTCCTGGAGTAGCTCGAATGTGAATTGGAGGCTCGATTCGGAGATCCCAAGCTGCCCGGGGACCAAGCGGCGGGGTTTGACCGGGGAAGCCTAGGGTTTTGACCGCCGATCTGGTCGGGAGAGGCCAGAGGTGGGGTGTGGGGGAACGAACGGCCGCGGcacgaggaagaagacgaagaggTAGCCGTTGGGCTGGGTTTGCTCCGCT NC_048263.1:92024-92121 CTtactcctccctccttctcttccgGGCTCTGCTGCGTCTGCGTCTGCGTCTGCGTCTGCGTGGgcaccactccactccactccagctccagctcca NC_048263.1:95962-95969 cTATGCT
Could you please help explain the reason or provide any approaches to clean the sequences? Thanks a lot!
Best regards, Chao