Aborted (core dumped) with LTR digest
Problem description
While using LTRdigest this error always pops up (which also appears in R studio using ltr digest via the LTRpred package)
This is a bug, please report it at
https://github.com/genometools/genometools/issues
Please make sure you are running the latest release which can be found at
http://genometools.org/pub/
You can check your version number with gt -version.
Aborted (core dumped)
Exact command line call triggering the problem
#PATH:
proteins="/home/omar-almulla/Downloads/"
genome="/home/omar-almulla/Desktop/Prunus_TE_project/INPUT/genomes/"
gff3="/home/omar-almulla/Desktop/Prunus_TE_project/OUTPUT/EDTA_outputs/20-WGS-PCE.2.0/20-WGS-PCE.2.0_shortIDs.fasta.mod.EDTA.raw/LTR/"
gt ltrdigest -hmms $proteins/Pfam-A.hmm -aaout -outfileprefix ltrs_sorted -seqfile $genome/20-WGS-PCE.2.0_shortIDs.fasta -matchdescstart < $gff3/LTR/ltrs_sorted.gff3 > ltrdigest.gff3
What GenomeTools version are you reporting an issue for (as output by gt -version)?
gt (GenomeTools) 1.6.2 Copyright (c) 2003-2016 G. Gremme, S. Steinbiss, S. Kurtz, and CONTRIBUTORS Copyright (c) 2003-2016 Center for Bioinformatics, University of Hamburg See LICENSE file or http://genometools.org/license.html for license details.
Used compiler: cc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Compile flags: -g -Wall -Wunused-parameter -pipe -fPIC -Wpointer-arith -Wno-unknown-pragmas -O3 -Werror
What operating system (e.g. Ubuntu, Mac OS X), OS version (e.g. 15.10, 10.11) and platform (e.g. x86_64) are you using?
Ubuntu 20.04
Are you sure there are no more lines before the "This is a bug" line? I would need those to locate the issue, as they describe the error context. Also, would you be OK with sharing some of your input files to help reproduce the problem? Thanks!
Thanks for the reply. To date I have solved the problem by giving ltr digest the output obtained from the EDTA 1.6 version. The above error appeared only when using the EDTA version 1.9 outputs. Unfortunately, I am not allowed to make the input files public as they are in the process of being published. Anyway with these changes everything is ok:
`tRNAs="/home/omar-almulla/Desktop/Prunus_TE_project/INPUT/Hmm_trna" proteins="/home/omar-almulla/Desktop/Prunus_TE_project/INPUT/Hmm_trna" genome='/home/omar-almulla/Desktop/Prunus_TE_project/INPUT/genomes/Prunus_avium_NCBI' EDTA_output_1_6_path="/home/omar-almulla/Desktop/Prunus_TE_project/OUTPUTS/EDTA_output/Prunus_avium_NCBI/EDTA_1.6_output/Prunus_avium_NCBI_genomic.fna.EDTA.raw" output="/home/omar-almulla/Desktop/Prunus_TE_project/OUTPUTS/LTRdigest_output/Prunus_avium_NCBI"
gt -j 4 ltrdigest -outfileprefix Prunus_avium_NCBI_ltr -trnas $tRNAs/plants-tRNA_cat.fa -hmms $proteins/hmm_* -seqfile $genome/Prunus_avium_NCBI_genomic.fna -matchdescstart $EDTA_output_1_6_path/Prunus_avium_NCBI_genomic.fna.LTR.intact.fa_SORTED_.1.6.gff3 > $output/Prunus_avium_NCBI_digest.gff `
I see. I'll keep this one open but can not do much without the test data. I am unfortunately not familiar with EDTA or LTRpred but perhaps that tool creates weird GFF3 structure?
Anyway, could you please still share the line you got before the "this is a bug, please report" line, if that's OK for you? It should contain something like "Assertion failed: ..." and would at least help us place the error somewhere, and also make this issue searchable for others with a similar problem.
My script:
gt -j 4 ltrdigest -outfileprefix Prunus_avium_ltr -trnas ./INPUT/Hmm_trna/plants-tRNA_cat.fa -hmms ./INPUT/Hmm_trna/hmm_* -seqfile ./INPUT/genomes/Prunus_avium_NCBI/Prunus_avium_NCBI.fna -matchdescstart ./OUTPUTS/EDTA_output/Prunus_avium_NCBI/EDTA_1.9_output/Prunus_avium_NCBI.fna.mod.EDTA.raw/*SORTED.gff3 > Prunus_avium_digest.gff
I could not replicate the same error. Now appear:
Segmentation fault (core dumped)
##gff-version 3 ##sequence-region CM024352.1 1 62324707 ##sequence-region CM024353.1 1 46928806 ##sequence-region CM024354.1 1 42862123 ##sequence-region CM024355.1 1 37373756 ##sequence-region CM024356.1 1 41299679 ##sequence-region CM024357.1 1 42624765 ##sequence-region CM024358.1 1 30632009 ##sequence-region CM024359.1 1 38835769 ##sequence-region JAAOZG010000014 1 51232 ##sequence-region JAAOZG010000020 1 36342 ##sequence-region JAAOZG010000023 1 31182 ##sequence-region JAAOZG010000027 1 27350 ##sequence-region JAAOZG010000035 1 22413 ##sequence-region JAAOZG010000061 1 97395 CM024352.1 EDTA repeat_region 191737 201094 . ? . ID=repeat_region1;name=CM024352.1:191742..201089;classification=LTR/unknown;sequence_ontology=SO:0000657;ltr_identity=0.9959;mathod=structural;motif=TGCA;tsd=TCCAT CM024352.1 EDTA target_site_duplication 191737 191741 . ? . Parent=repeat_region1;name=CM024352.1:191742..201089;classification=LTR/unknown;sequence_ontology=SO:0000434;ltr_identity=0.9959;mathod=structural;motif=TGCA;tsd=TCCAT CM024352.1 EDTA long_terminal_repeat 191742 193449 . ? . Parent=repeat_region1;name=CM024352.1:191742..201089;classification=LTR/unknown;sequence_ontology=SO:0000286;ltr_identity=0.9959;mathod=structural;motif=TGCA;tsd=TCCAT CM024352.1 EDTA LTR_retrotransposon 191742 201089 . ? . Parent=repeat_region1;name=CM024352.1:191742..201089;classification=LTR/unknown;sequence_ontology=SO:0000186;ltr_identity=0.9959;mathod=structural;motif=TGCA;tsd=TCCAT CM024352.1 EDTA long_terminal_repeat 199383 201089 . ? . Parent=repeat_region1;name=CM024352.1:191742..201089;classification=LTR/unknown;sequence_ontology=SO:0000286;ltr_identity=0.9959;mathod=structural;motif=TGCA;tsd=TCCAT CM024352.1 EDTA target_site_duplication 201090 201094 . ? . Parent=repeat_region1;name=CM024352.1:191742..201089;classification=LTR/unknown;sequence_ontology=SO:0000434;ltr_identity=0.9959;mathod=structural;motif=TGCA;tsd=TCCAT
CM024352.1 EDTA repeat_region 1617430 1629426 . ? . ID=repeat_region2;name=CM024352.1:1617435..1629421;classification=LTR/Gypsy;sequence_ontology=SO:0000657;ltr_identity=1.0000;mathod=structural;motif=TGCA;tsd=CCAAT CM024352.1 EDTA target_site_duplication 1617430 1617434 . ? . Parent=repeat_region2;name=CM024352.1:1617435..1629421;classification=LTR/Gypsy;sequence_ontology=SO:0000434;ltr_identity=1.0000;mathod=structural;motif=TGCA;tsd=CCAAT CM024352.1 EDTA long_terminal_repeat 1617435 1619599 . ? . Parent=repeat_region2;name=CM024352.1:1617435..1629421;classification=LTR/Gypsy;sequence_ontology=SO:0000286;ltr_identity=1.0000;mathod=structural;motif=TGCA;tsd=CCAAT CM024352.1 EDTA Gypsy_LTR_retrotransposon 1617435 1629421 . ? . Parent=repeat_region2;name=CM024352.1:1617435..1629421;classification=LTR/Gypsy;sequence_ontology=SO:0002265;ltr_identity=1.0000;mathod=structural;motif=TGCA;tsd=CCAAT CM024352.1 EDTA long_terminal_repeat 1627258 1629421 . ? . Parent=repeat_region2;name=CM024352.1:1617435..1629421;classification=LTR/Gypsy;sequence_ontology=SO:0000286;ltr_identity=1.0000;mathod=structural;motif=TGCA;tsd=CCAAT CM024352.1 EDTA target_site_duplication 1629422 1629426 . ? . Parent=repeat_region2;name=CM024352.1:1617435..1629421;classification=LTR/Gypsy;sequence_ontology=SO:0000434;ltr_identity=1.0000;mathod=structural;motif=TGCA;tsd=CCAAT
CM024352.1 EDTA repeat_region 1946186 1956558 . ? . ID=repeat_region3;name=CM024352.1:1946191..1956553;classification=LTR/unknown;sequence_ontology=SO:0000657;ltr_identity=0.9991;mathod=structural;motif=TGCA;tsd=GTAAT CM024352.1 EDTA target_site_duplication 1946186 1946190 . ? . Parent=repeat_region3;name=CM024352.1:1946191..1956553;classification=LTR/unknown;sequence_ontology=SO:0000434;ltr_identity=0.9991;mathod=structural;motif=TGCA;tsd=GTAAT CM024352.1 EDTA long_terminal_repeat 1946191 1948386 . ? . Parent=repeat_region3;name=CM024352.1:1946191..1956553;classification=LTR/unknown;sequence_ontology=SO:0000286;ltr_identity=0.9991;mathod=structural;motif=TGCA;tsd=GTAAT CM024352.1 EDTA LTR_retrotransposon 1946191 1956553 . ? . Parent=repeat_region3;name=CM024352.1:1946191..1956553;classification=LTR/unknown;sequence_ontology=SO:0000186;ltr_identity=0.9991;mathod=structural;motif=TGCA;tsd=GTAAT CM024352.1 EDTA long_terminal_repeat 1954358 1956553 . ? . Parent=repeat_region3;name=CM024352.1:1946191..1956553;classification=LTR/unknown;sequence_ontology=SO:0000286;ltr_identity=0.9991;mathod=structural;motif=TGCA;tsd=GTAAT CM024352.1 EDTA target_site_duplication 1956554 1956558 . ? . Parent=repeat_region3;name=CM024352.1:1946191..1956553;classification=LTR/unknown;sequence_ontology=SO:0000434;ltr_identity=0.9991;mathod=structural;motif=TGCA;tsd=GTAAT
I am afraid the GFF3 file is not enough for me to replicate the issue, I would also need the other files (sequence FASTA and tRNA files). Basically I need a way to trigger the error on my side with your command line call. Thanks!