dammit icon indicating copy to clipboard operation
dammit copied to clipboard

empty transcript names in final gff3

Open aburzynski opened this issue 8 years ago • 10 comments

I am running the latest release. All tasks run seemingly OK but the final gff3 file contains many lines like that at the beginning:

gff-version 3.2.1

. HMMER protein_hmm_match 1000 1035 11000.0 . . ID=homology:12545;Name=LRR_1;Target=LRR_1 2 13 +;Note=Leucine Rich Repeat;accuracy=0.76;env_coords=997 1074;Dbxref="Pfam:PF00560.29" . HMMER protein_hmm_match 1000 1038 150.0 . . ID=homology:12126;Name=Kelch_5;Target=Kelch_5 14 27 +;Note=Kelch motif;accuracy=0.71;env_coords=964 1065;Dbxref="Pfam:PF13854.2" . HMMER protein_hmm_match 1000 1077 0.021 . . ID=homology:14530;Name=EGF;Target=EGF 1 31 +;Note=EGF-like domain;accuracy=0.87;env_coords=1000 1080;Dbxref="Pfam:PF00008.23" ....

Note the lack of transcript name in the first column. In the later part of the file the things look better:

Transcript_9 transdecoder CDS 6481 6729 . + 0 ID=cds.Gene.13::Transcript_9::g.13::m.13;Parent=Gene.13::Transcript_9::g.13::m.13

What went wrong?

j131

aburzynski avatar Jun 15 '16 15:06 aburzynski

Curious! Can you give me your dammit version info (dammit --version), a little snippet of the input FASTA (head INPUT.fa), and a bit of your log file (tail -n 200 ~/.dammit/log/dammit-all.log)?

camillescott avatar Jun 22 '16 22:06 camillescott

$ dammit --version dammit 0.3

$ head tr-pro.fa

TRINITY_DN14884_c0_g1_i1 len=407 path=[655:0-292 656:293-349 657:350-406] [-1, 655, 656, 657, -2] GTTATTTTAATGGATGACAAAGTAACATTTCCAAAATGCTAATATGCTCTTTTGTCGATA AATTTGACCCTTTGACCTCTAAATGTGACAATCACTTGAGGAAGGTTGCAGACCTGGATT ATCAGGATAAAAAAATCAATCTGCTTGACTGGACTAACCCTCTACCATCTCCTTATGGCC CATATTGCACAGGACCAAACAATAAAGTTTGGACTTGTAATACACCATACTGCCCCAAAA CAGGTAGAGTTAGGCACACATGACTGCACTTATTCTCTACCATCTCCTTATGGAGAGAAT GGAATGGAGTTGTTATATGACATGCTGTATTCTCTACCATCTCCTTATGGAGAGAATGGA ATGGAGTTGTTATATGACATGCTGTATTCTCTACCATCTCCTTATGG TRINITY_DN14856_c0_g1_i1 len=362 path=[679:0-361] [-1, 679, -2] GGAAAGTGTTGTCCCGGATGCCTGTGTGGACTGCACAGGCTAATCTGGGACGACACGTAA

log.txt

aburzynski avatar Jun 23 '16 12:06 aburzynski

Hi, I am having the same issue with the GFF3 output file. Transcript names are missing in the first column for the HMMER results, but not for later lines e.g. transdecoder. Any word on resolving this issue? I'd really like to use the program.

Here are the details: Version: dammit 0.3 head input.fasta

TRINITY_DN7361_c0_g1_i1 len=890 ATCTGAGTTATCTGTAGAGACAGTGGAGACAAAGTGAATTAGGGTGAGATGCTTGAAATG CCCCCCCTTCACCTCCATTATTCAGCCCCTCCCTACAGACCCACACACACTCTCGTATAC ACAGATGTCTATATATGTGACACTCCTCCACCATCTCAGTCTCACAGTAGCCAATGAAAT GAGAGCTGCTTGATTTTAGCACCAACACCCAGCTCACGTCATCAGGTTCCCCCCACCCCA GCCCCCACACCCTCTCCTCCCATGTGTGTGTGCAGTGTGTGTCCTCCCATCTATGTCTGC ATGCCTGACAGGATAAATCCAGCCCCCCCCCATACTCTCTCTCTCTCTTTCTCCCTGGCT CCGTCACTCACACACACACACCCCTACACACACACAGGTGCTGCTCTTTGGCAAGCCTAA CTCTTACACACACAAAACACACAAACTCAAGAGAGGGTATTAGTTGGTCCACTTTTGTGG TGCTGAGGAACTTGATATTTTGCCTTCATGGAGGATTACCACAAACCTGACCAGCAGACA

tail -n 200 ~/.dammit/log/dammit-all.log 10-14 23:10:45 dammit.annotate:write:85 [INFO]

[ ] maf_best_hits:trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr-small-header.cd-hit-0.98.fasta.x.orthodb.maf-trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr-small-header.cd-hit-0.98.fasta.x.orthodb.maf.best.csv

10-14 23:12:33 dammit.annotate:write:85 [INFO]

[ ] maf-gff3:trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr-small-header.cd-hit-0.98.fasta.x.orthodb.maf.gff3

10-14 23:12:55 dammit.annotate:write:85 [INFO]

[ ] hmmscan-gff3:trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr-small-header.cd-hit-0.98.fasta.pfam.csv.gff3

10-14 23:14:16 dammit.annotate:write:85 [INFO]

[ ] cmscan-gff3:trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr-small-header.cd-hit-0.98.fasta.rfam.tbl.gff3

10-14 23:14:17 dammit.annotate:write:85 [INFO]

[ ] gff3-merge:trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr-small-header.cd-hit-0.98.fasta.dammit.gff3

10-14 23:14:35 dammit.annotate:write:85 [INFO]

[ ] fasta-annotate:trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr-small-header.cd-hit-0.98.fasta.dammit.fasta

10-14 23:32:40 AnnotateHandler:run_tasks:123 [DEBUG]

chdir: /scratch/bluehead_wrasse/transcriptome/trinity_v2.2.0_TBK14_05-08-2016-IPTPFsubset-corr/per_sample_error_corrected_k23/Trinity-output/analysis/annotation/dammit

10-17 13:25:01 root:run:69 [DEBUG]

*** dammit! begin ***

ericavt avatar Oct 17 '16 00:10 ericavt

I haven't managed to reproduce this, and unfortunately the logs haven't been very useful. Could you give me as much of the transcriptome fasta as you can reasonably send, so that I can try to run it on my machine? I can think of a couple places where things could go wrong, but unfortunately, the one transcript doesn't seem to trigger it (which kinda shoots down my "my regex sucks" idea).

camillescott avatar Oct 20 '16 23:10 camillescott

Camille. I am having this bug, too. Can help send you a fasta if you still need one.

macmanes avatar Oct 28 '16 12:10 macmanes

That'd be helpful! :)

On Fri, Oct 28, 2016 at 8:52 AM, Matt MacManes [email protected] wrote:

Camille. I am having this bug, too. Can help send you a fasta if you still need one.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/camillescott/dammit/issues/67#issuecomment-256912562, or mute the thread https://github.com/notifications/unsubscribe-auth/ACwxrU3kWv6rVBQhB6rvajXqDeKGuYlFks5q4fAHgaJpZM4I2esg .

Camille Scott

Graduate Group for Computer Science Lab for Data Intensive Biology University of California, Davis

[email protected]

camillescott avatar Oct 28 '16 14:10 camillescott

Pleased to report that after a serendipitous sit-down with @butterflyology, we were able to reproduce this error on his data and I now have everything I need to fix it! So, should be good to go soon enough.

camillescott avatar Nov 03 '16 23:11 camillescott

is this an easy fix on my end? I have a couple of transcrptomes waiting to run.

macmanes avatar Nov 10 '16 13:11 macmanes

any update here?

macmanes avatar Nov 21 '16 14:11 macmanes

The issue is fixed in the 1.0 beta branch; I'll put up some instructions for upgrading shortly.

camillescott avatar Nov 22 '16 03:11 camillescott