stringtie icon indicating copy to clipboard operation
stringtie copied to clipboard

Error: could not locate transcript ENST00000472017.1

Open Upreti-Anil opened this issue 1 year ago • 4 comments

Hello StringTie team,

I'm running into an issue where certain transcripts present in my BAM files are not appearing in the GTF output from StringTie. This causes errors when I try to generate transcript count matrices using prepDE.py, as it cannot locate these missing transcripts in some samples.

My Setup: I’m using StringTie to assemble transcripts and quantify expression from sorted BAM files, with the -G option pointing to a comprehensive GTF annotation file (from Gencode). For transcript quantification, I run StringTie with the parameters: stringtie -e $SORTED_BAM_FILE -o ${SAMPLE_NAME}.gtf -p $NUM_THREADS -G $GTF_FILE -A abundances.tab -C cov_refs.gtf -B

Error: could not locate transcript ENST00000697250.1 entry for sample OPL_B ## error from different run Error: could not locate transcript ENST00000607096.1 entry for sample CEXP_B ## error from different run

Are there specific StringTie parameters that would help ensure more consistent detection of transcripts across samples? Is there a recommended approach for cases where transcripts appear in BAM files but are missing in StringTie’s GTF output, especially for downstream differential expression analysis with prepDE.py?

Any insights or suggested settings would be much appreciated, as I’m aiming to achieve a comprehensive transcript count matrix compatible with DEseq2.

Thank you!

Upreti-Anil avatar Nov 13 '24 19:11 Upreti-Anil

I have similar error.

starmoon66 avatar Feb 11 '25 14:02 starmoon66

Have you solved it?

starmoon66 avatar Feb 11 '25 14:02 starmoon66

I started using ballgown pipeline. Thanks

On Tue, Feb 11, 2025 at 9:43 AM starmoon66 @.***> wrote:

Have you solved it?

— Reply to this email directly, view it on GitHub https://github.com/gpertea/stringtie/issues/451#issuecomment-2651027946, or unsubscribe https://github.com/notifications/unsubscribe-auth/BMWEH4AU52FTQLQOVP6NAKD2PIECPAVCNFSM6AAAAABRXHTSYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJRGAZDOOJUGY . You are receiving this because you authored the thread.Message ID: @.***>

--

Thank You With Regards

Anil Upreti, PhD Research Fellow Schepens Eye Research Institute, MEEI Harvard Medical School

Upreti-Anil avatar Feb 11 '25 15:02 Upreti-Anil

Thank you~

starmoon66 avatar Feb 14 '25 03:02 starmoon66

It seems stringtie 3.0.0 generates GTF files containing varied numbers of rows among different samples even using a common merged guided GTF file for quantification. So there will be some transcripts that only exist in certain samples. I have downgraded to version 2.1.1 and the error goes away.

Chenmy38 avatar May 07 '25 09:05 Chenmy38

I am having the same error in v3.0.1

Error: could not locate transcript ENSMUST00000192299 entry for sample 254 Traceback (most recent call last): File "/scratch/XX/XX/Workspace/XX/08_StringTie/quantification/prepDE.py3", line 282, in geneDict[geneIDs[i]][s[0]]+=v[s[0]] ~^^^^^^ KeyError: '254'

I checked the t_data.ctab for this sample and the cov and FPKM are both "0.000000"

phillipkwest avatar Sep 18 '25 05:09 phillipkwest