Finder
Finder copied to clipboard
Distant exon prediction
Hello @sagnikbanerjee15, During the CodAn step, the execution terminates due to the missing annotation.gtf file when findCDS function is launched. This is because CodAn process terminates due to duplicate entries found in the combined_split_transcripts_with_bad_SJ_redundancy_removed.fasta file. Upon closer inspection of the gtf file, it can be seen that these duplicate FASTA entries arise from transcripts with predicted exons that are very far apart, and are therefore split into two FASTA entries. For example:
1 FINDER transcript 57285934 67077786 1000 - . gene_id "1.1714_0_covsplit"; transcript_id "1.1714_0_covsplit.0"; FPKM "0.140324"; TPM "1.128119"; cov "1.697479";
1 FINDER exon 57285934 57285945 1000 - . gene_id "1.1714_0_covsplit"; transcript_id "1.1714_0_covsplit.0"; FPKM "0.140324"; TPM "1.128119"; cov "1.697479";
1 FINDER exon 67069393 67069715 1000 - . gene_id "1.1714_0_covsplit"; transcript_id "1.1714_0_covsplit.0"; FPKM "0.140324"; TPM "1.128119"; cov "1.697479";
1 FINDER exon 67077589 67077786 1000 - . gene_id "1.1714_0_covsplit"; transcript_id "1.1714_0_covsplit.0"; FPKM "0.140324"; TPM "1.128119"; cov "1.697479";
OR
1 FINDER transcript 52209472 65689477 1000 - . gene_id "1.1561_1_covsplit"; transcript_id "1.1561_1_covsplit.0"; FPKM "0.272933"; TPM "2.573257"; cov "6.587509";
1 FINDER exon 52209472 52209482 1000 - . gene_id "1.1561_1_covsplit"; transcript_id "1.1561_1_covsplit.0"; FPKM "0.272933"; TPM "2.573257"; cov "6.587509";
1 FINDER exon 65688877 65689477 1000 - . gene_id "1.1561_1_covsplit"; transcript_id "1.1561_1_covsplit.0"; FPKM "0.272933"; TPM "2.573257"; cov "6.587509";
OR
1 FINDER transcript 38671110 52554711 1000 - . gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058";
1 FINDER exon 38671110 38671164 1000 - . gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058";
1 FINDER exon 38672247 38672348 1000 - . gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058";
1 FINDER exon 38673558 38673756 1000 - . gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058";
1 FINDER exon 38674078 38674319 1000 - . gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058";
1 FINDER exon 38675099 38675342 1000 - . gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058";
1 FINDER exon 38675872 38676033 1000 - . gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058";
1 FINDER exon 38677051 38677323 1000 - . gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058";
1 FINDER exon 38677638 38677885 1000 - . gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058";
1 FINDER exon 52554169 52554711 1000 - . gene_id "1.1108_1_covsplit"; transcript_id "1.1108_1_covsplit.1"; FPKM "0.241149"; TPM "2.245845"; cov "4.710058";
This seems to be an issue only with the FINDER-predicted transcripts and does not occur with the PsiCLASS transcripts. It is quite laborious for me to examine the issue further, so I was hoping that you could provide some explanations as to why/where this could be occurring and whether there could be a quick fix? Complete removal of these entries from the combined_split_transcripts_with_bad_SJ_redundancy_removed files does let the run complete, however, it is not ideal. I was thinking of simply splitting these entries into two separate transcripts using a custom script which should hopefully work, but does not tell much about the source of the issue.