Finder icon indicating copy to clipboard operation
Finder copied to clipboard

Distant exon prediction

Open Maxim-Karpov opened this issue 1 month ago • 2 comments

Hello @sagnikbanerjee15, During the CodAn step, the execution terminates due to the missing annotation.gtf file when findCDS function is launched. This is because CodAn process terminates due to duplicate entries found in the combined_split_transcripts_with_bad_SJ_redundancy_removed.fasta file. Upon closer inspection of the gtf file, it can be seen that these duplicate FASTA entries arise from transcripts with predicted exons that are very far apart, and are therefore split into two FASTA entries. For example:

1	FINDER	transcript	57285934	67077786	1000	-	.	gene_id "1.1714_0_covsplit"; transcript_id "1.1714_0_covsplit.0"; FPKM "0.140324"; TPM "1.128119"; cov "1.697479"; 

1	FINDER	exon	57285934	57285945	1000	-	.	gene_id "1.1714_0_covsplit"; transcript_id "1.1714_0_covsplit.0"; FPKM "0.140324"; TPM "1.128119"; cov "1.697479"; 

1	FINDER	exon	67069393	67069715	1000	-	.	gene_id "1.1714_0_covsplit"; transcript_id "1.1714_0_covsplit.0"; FPKM "0.140324"; TPM "1.128119"; cov "1.697479"; 

1	FINDER	exon	67077589	67077786	1000	-	.	gene_id "1.1714_0_covsplit"; transcript_id "1.1714_0_covsplit.0"; FPKM "0.140324"; TPM "1.128119"; cov "1.697479"; 

OR

1	FINDER	transcript	52209472	65689477	1000	-	.	gene_id "1.1561_1_covsplit"; transcript_id "1.1561_1_covsplit.0"; FPKM "0.272933"; TPM "2.573257"; cov "6.587509"; 

1	FINDER	exon	52209472	52209482	1000	-	.	gene_id "1.1561_1_covsplit"; transcript_id "1.1561_1_covsplit.0"; FPKM "0.272933"; TPM "2.573257"; cov "6.587509"; 

1	FINDER	exon	65688877	65689477	1000	-	.	gene_id "1.1561_1_covsplit"; transcript_id "1.1561_1_covsplit.0"; FPKM "0.272933"; TPM "2.573257"; cov "6.587509"; 

OR

1	FINDER	transcript	38671110	52554711	1000	-	.	gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058"; 

1	FINDER	exon	38671110	38671164	1000	-	.	gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058"; 

1	FINDER	exon	38672247	38672348	1000	-	.	gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058"; 

1	FINDER	exon	38673558	38673756	1000	-	.	gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058"; 

1	FINDER	exon	38674078	38674319	1000	-	.	gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058"; 

1	FINDER	exon	38675099	38675342	1000	-	.	gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058"; 

1	FINDER	exon	38675872	38676033	1000	-	.	gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058"; 

1	FINDER	exon	38677051	38677323	1000	-	.	gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058"; 

1	FINDER	exon	38677638	38677885	1000	-	.	gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058"; 

1	FINDER	exon	52554169	52554711	1000	-	.	gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058"; 

This seems to be an issue only with the FINDER-predicted transcripts and does not occur with the PsiCLASS transcripts. It is quite laborious for me to examine the issue further, so I was hoping that you could provide some explanations as to why/where this could be occurring and whether there could be a quick fix? Complete removal of these entries from the combined_split_transcripts_with_bad_SJ_redundancy_removed files does let the run complete, however, it is not ideal. I was thinking of simply splitting these entries into two separate transcripts using a custom script which should hopefully work, but does not tell much about the source of the issue.

Maxim-Karpov avatar May 07 '24 10:05 Maxim-Karpov