dammit
dammit copied to clipboard
empty transcript names in final gff3
I am running the latest release. All tasks run seemingly OK but the final gff3 file contains many lines like that at the beginning:
gff-version 3.2.1
. HMMER protein_hmm_match 1000 1035 11000.0 . . ID=homology:12545;Name=LRR_1;Target=LRR_1 2 13 +;Note=Leucine Rich Repeat;accuracy=0.76;env_coords=997 1074;Dbxref="Pfam:PF00560.29" . HMMER protein_hmm_match 1000 1038 150.0 . . ID=homology:12126;Name=Kelch_5;Target=Kelch_5 14 27 +;Note=Kelch motif;accuracy=0.71;env_coords=964 1065;Dbxref="Pfam:PF13854.2" . HMMER protein_hmm_match 1000 1077 0.021 . . ID=homology:14530;Name=EGF;Target=EGF 1 31 +;Note=EGF-like domain;accuracy=0.87;env_coords=1000 1080;Dbxref="Pfam:PF00008.23" ....
Note the lack of transcript name in the first column. In the later part of the file the things look better:
Transcript_9 transdecoder CDS 6481 6729 . + 0 ID=cds.Gene.13::Transcript_9::g.13::m.13;Parent=Gene.13::Transcript_9::g.13::m.13
What went wrong?
j131
Curious! Can you give me your dammit version info (dammit --version
), a little snippet of the input FASTA (head INPUT.fa
), and a bit of your log file (tail -n 200 ~/.dammit/log/dammit-all.log
)?
$ dammit --version dammit 0.3
$ head tr-pro.fa
TRINITY_DN14884_c0_g1_i1 len=407 path=[655:0-292 656:293-349 657:350-406] [-1, 655, 656, 657, -2] GTTATTTTAATGGATGACAAAGTAACATTTCCAAAATGCTAATATGCTCTTTTGTCGATA AATTTGACCCTTTGACCTCTAAATGTGACAATCACTTGAGGAAGGTTGCAGACCTGGATT ATCAGGATAAAAAAATCAATCTGCTTGACTGGACTAACCCTCTACCATCTCCTTATGGCC CATATTGCACAGGACCAAACAATAAAGTTTGGACTTGTAATACACCATACTGCCCCAAAA CAGGTAGAGTTAGGCACACATGACTGCACTTATTCTCTACCATCTCCTTATGGAGAGAAT GGAATGGAGTTGTTATATGACATGCTGTATTCTCTACCATCTCCTTATGGAGAGAATGGA ATGGAGTTGTTATATGACATGCTGTATTCTCTACCATCTCCTTATGG TRINITY_DN14856_c0_g1_i1 len=362 path=[679:0-361] [-1, 679, -2] GGAAAGTGTTGTCCCGGATGCCTGTGTGGACTGCACAGGCTAATCTGGGACGACACGTAA
Hi, I am having the same issue with the GFF3 output file. Transcript names are missing in the first column for the HMMER results, but not for later lines e.g. transdecoder. Any word on resolving this issue? I'd really like to use the program.
Here are the details: Version: dammit 0.3 head input.fasta
TRINITY_DN7361_c0_g1_i1 len=890 ATCTGAGTTATCTGTAGAGACAGTGGAGACAAAGTGAATTAGGGTGAGATGCTTGAAATG CCCCCCCTTCACCTCCATTATTCAGCCCCTCCCTACAGACCCACACACACTCTCGTATAC ACAGATGTCTATATATGTGACACTCCTCCACCATCTCAGTCTCACAGTAGCCAATGAAAT GAGAGCTGCTTGATTTTAGCACCAACACCCAGCTCACGTCATCAGGTTCCCCCCACCCCA GCCCCCACACCCTCTCCTCCCATGTGTGTGTGCAGTGTGTGTCCTCCCATCTATGTCTGC ATGCCTGACAGGATAAATCCAGCCCCCCCCCATACTCTCTCTCTCTCTTTCTCCCTGGCT CCGTCACTCACACACACACACCCCTACACACACACAGGTGCTGCTCTTTGGCAAGCCTAA CTCTTACACACACAAAACACACAAACTCAAGAGAGGGTATTAGTTGGTCCACTTTTGTGG TGCTGAGGAACTTGATATTTTGCCTTCATGGAGGATTACCACAAACCTGACCAGCAGACA
tail -n 200 ~/.dammit/log/dammit-all.log 10-14 23:10:45 dammit.annotate:write:85 [INFO]
[ ] maf_best_hits:trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr-small-header.cd-hit-0.98.fasta.x.orthodb.maf-trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr-small-header.cd-hit-0.98.fasta.x.orthodb.maf.best.csv
10-14 23:12:33 dammit.annotate:write:85 [INFO]
[ ] maf-gff3:trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr-small-header.cd-hit-0.98.fasta.x.orthodb.maf.gff3
10-14 23:12:55 dammit.annotate:write:85 [INFO]
[ ] hmmscan-gff3:trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr-small-header.cd-hit-0.98.fasta.pfam.csv.gff3
10-14 23:14:16 dammit.annotate:write:85 [INFO]
[ ] cmscan-gff3:trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr-small-header.cd-hit-0.98.fasta.rfam.tbl.gff3
10-14 23:14:17 dammit.annotate:write:85 [INFO]
[ ] gff3-merge:trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr-small-header.cd-hit-0.98.fasta.dammit.gff3
10-14 23:14:35 dammit.annotate:write:85 [INFO]
[ ] fasta-annotate:trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr-small-header.cd-hit-0.98.fasta.dammit.fasta
10-14 23:32:40 AnnotateHandler:run_tasks:123 [DEBUG]
chdir: /scratch/bluehead_wrasse/transcriptome/trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr/per_sample_error_corrected_k23/Trinity-output/analysis/annotation/dammit
10-17 13:25:01 root:run:69 [DEBUG]
*** dammit! begin ***
I haven't managed to reproduce this, and unfortunately the logs haven't been very useful. Could you give me as much of the transcriptome fasta as you can reasonably send, so that I can try to run it on my machine? I can think of a couple places where things could go wrong, but unfortunately, the one transcript doesn't seem to trigger it (which kinda shoots down my "my regex sucks" idea).
Camille. I am having this bug, too. Can help send you a fasta if you still need one.
That'd be helpful! :)
On Fri, Oct 28, 2016 at 8:52 AM, Matt MacManes [email protected] wrote:
Camille. I am having this bug, too. Can help send you a fasta if you still need one.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/camillescott/dammit/issues/67#issuecomment-256912562, or mute the thread https://github.com/notifications/unsubscribe-auth/ACwxrU3kWv6rVBQhB6rvajXqDeKGuYlFks5q4fAHgaJpZM4I2esg .
Camille Scott
Graduate Group for Computer Science Lab for Data Intensive Biology University of California, Davis
Pleased to report that after a serendipitous sit-down with @butterflyology, we were able to reproduce this error on his data and I now have everything I need to fix it! So, should be good to go soon enough.
is this an easy fix on my end? I have a couple of transcrptomes waiting to run.
any update here?
The issue is fixed in the 1.0 beta branch; I'll put up some instructions for upgrading shortly.