AGAT icon indicating copy to clipboard operation
AGAT copied to clipboard

`agat_sp_extract_sequences.pl`: support for multicistronic transcripts.

Open jdcla opened this issue 2 months ago • 0 comments

Is your feature request related to a problem? Please describe. Currently agat_sp_extract_sequences.pl (could be other scripts as well) does not support multicistronic transcripts. While this feature is often not supported by various gtf/gff tools, studies increasingly indicate the existence of translated ORFs positioned upstream/downstream/... of canonical coding sequences.

Describe the solution you'd like When running agat_sp_extract_sequences.pl, I would like agat_sp_extract_sequences.pl to be able to handle multiple CDSs defined per transcript/mRNA feature. To start of, the tool would evaluate CDS IDs rather than transcript IDs as fasta headers (see this issue). Currently, I think the tool ignores or merges multicistronic CDSs with identical transcript IDs.

Describe alternatives you've considered Today, it's possible to define a unique mRNA feature for each CDS, similar to the solution described here. It's a hacky solution that fails to show that multiple CDSs are from the same transcript.

jdcla avatar Apr 08 '24 20:04 jdcla