graftM icon indicating copy to clipboard operation
graftM copied to clipboard

graftM create error

Open adityabandla opened this issue 4 years ago • 6 comments

I am trying to build a graftM package for the rpsB gene from the GTDB r89 species-level genomes. I downloaded the unaligned bacterial & archaeal marker gene sequences & concatenated them (i.e. TIGR01011.fna & TIGR01012.fna). Modified the taxonomy files to match the contig headers in the sequence fasta.

Next, I attempted to create the packages by

graftM create --sequences rpsB_bac_arc.fna --taxonomy rpsB_bac_arc_taxonomy.tsv --threads 24

But I ran into this error

Traceback (most recent call last):
  File "/home/projects/11001755/miniconda3/envs/singlem/bin/graftM", line 415, in <module>
    Run(args).main()
  File "/home/projects/11001755/miniconda3/envs/singlem/lib/python3.6/site-packages/graftm/run.py", line 681, in main
    threads = self.args.threads
  File "/home/projects/11001755/miniconda3/envs/singlem/lib/python3.6/site-packages/graftm/create.py", line 636, in main
    align_hmm, output_alignment, threads)
  File "/home/projects/11001755/miniconda3/envs/singlem/lib/python3.6/site-packages/graftm/create.py", line 454, in _align_and_create_hmm
    output_alignment)
  File "/home/projects/11001755/miniconda3/envs/singlem/lib/python3.6/site-packages/graftm/create.py", line 143, in _get_hmm_from_alignment
    output = extern.run(cmd)
  File "/home/projects/11001755/miniconda3/envs/singlem/lib/python3.6/site-packages/extern/__init__.py", line 41, in run
    raise ExternCalledProcessError(process, command)
extern.ExternCalledProcessError: Command hmmbuild -O /dev/stdout -o /dev/stderr '/var/tmp/pbs.71118.wlm01/graftmbcgozapa_align.hmm' '/var/tmp/pbs.71118.wlm01/graftm3k540rg3.aln.faa' returned non-zero exit status 7.
STDERR was: b'Alignment input parse error:\n   sequence GCF_002135345.1 has alen 5348; expected 25270\n   while reading aligned FASTA file /var/tmp/pbs.71118.wlm01/graftm3k540rg3.aln.faa\n   at or near line 7171633\n# hmmbuild :: profile HMM construction from multiple sequence alignments\n# HMMER 3.3 (Nov 2019); http://hmmer.org/\n# Copyright (C) 2019 Howard Hughes Medical Institute.\n# Freely distributed under the BSD open source license.\n# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\n# input alignment file:             /var/tmp/pbs.71118.wlm01/graftm3k540rg3.aln.faa\n# output HMM file:                  /var/tmp/pbs.71118.wlm01/graftmbcgozapa_align.hmm\n# output directed to file:          /dev/stderr\n# processed alignment resaved to:   /dev/stdout\n# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\n\n# idx name                  nseq  alen  mlen     W eff_nseq re/pos description\n#---- -------------------- ----- ----- ----- ----- -------- ------ -----------\n'STDOUT was: b''

adityabandla avatar May 11 '20 20:05 adityabandla

Hi,

Doesn't sound like you are doing anything wrong on first look, but can you make the input files available? I'm on leave and won't be able to get to this for 2 months or so. @geronimp ?


From: Aditya Bandla [email protected] Sent: Tuesday, May 12, 2020 6:12:41 AM To: geronimp/graftM [email protected] Cc: Subscribed [email protected] Subject: [geronimp/graftM] graftM create error (#265)

I am trying to build a graftM package for the rpsB gene from the GTDB r89 species-level genomes. I downloaded the unaligned bacterial & archaeal marker gene sequences & concatenated them (i.e. TIGR01011.fna & TIGR01012.fna). Modified the taxonomy files to match the contig headers in the sequence fasta.

Next, I attempted to create the packages by

graftM create --sequences rpsB_bac_arc.fna --taxonomy rpsB_bac_arc_taxonomy.tsv --threads 24

But I ran into this error

Traceback (most recent call last): File "/home/projects/11001755/miniconda3/envs/singlem/bin/graftM", line 415, in Run(args).main() File "/home/projects/11001755/miniconda3/envs/singlem/lib/python3.6/site-packages/graftm/run.py", line 681, in main threads = self.args.threads File "/home/projects/11001755/miniconda3/envs/singlem/lib/python3.6/site-packages/graftm/create.py", line 636, in main align_hmm, output_alignment, threads) File "/home/projects/11001755/miniconda3/envs/singlem/lib/python3.6/site-packages/graftm/create.py", line 454, in _align_and_create_hmm output_alignment) File "/home/projects/11001755/miniconda3/envs/singlem/lib/python3.6/site-packages/graftm/create.py", line 143, in _get_hmm_from_alignment output = extern.run(cmd) File "/home/projects/11001755/miniconda3/envs/singlem/lib/python3.6/site-packages/extern/init.py", line 41, in run raise ExternCalledProcessError(process, command) extern.ExternCalledProcessError: Command hmmbuild -O /dev/stdout -o /dev/stderr '/var/tmp/pbs.71118.wlm01/graftmbcgozapa_align.hmm' '/var/tmp/pbs.71118.wlm01/graftm3k540rg3.aln.faa' returned non-zero exit status 7. STDERR was: b'Alignment input parse error:\n sequence GCF_002135345.1 has alen 5348; expected 25270\n while reading aligned FASTA file /var/tmp/pbs.71118.wlm01/graftm3k540rg3.aln.faa\n at or near line 7171633\n# hmmbuild :: profile HMM construction from multiple sequence alignments\n# HMMER 3.3 (Nov 2019); http://hmmer.org/\n# Copyright (C) 2019 Howard Hughes Medical Institute.\n# Freely distributed under the BSD open source license.\n# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\n# input alignment file: /var/tmp/pbs.71118.wlm01/graftm3k540rg3.aln.faa\n# output HMM file: /var/tmp/pbs.71118.wlm01/graftmbcgozapa_align.hmm\n# output directed to file: /dev/stderr\n# processed alignment resaved to: /dev/stdout\n# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\n\n# idx name nseq alen mlen W eff_nseq re/pos description\n#---- -------------------- ----- ----- ----- ----- -------- ------ -----------\n'STDOUT was: b''

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/geronimp/graftM/issues/265, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAADX5COUOOIOKX6UBGOBEDRRBL3TANCNFSM4M6G6JCQ.

wwood avatar May 12 '20 00:05 wwood

Hey, happy to take this on - please provide the taxonomy and sequences file and ill take a look

geronimp avatar May 12 '20 03:05 geronimp

Here are the files rpsB_bac_arc.txt rpsB_bac_arc_taxonomy.txt

adityabandla avatar May 12 '20 13:05 adityabandla

@geronimp I'm trying to create this package for a ribosomal protein, and I'm providing the trimmed protein MSA using the --alignment flag. In this case, does graftM expect the --sequences to be nucleotides or proteins?

adityabandla avatar May 14 '20 19:05 adityabandla

Seems to work when I use the aa sequences, but not nt

adityabandla avatar May 14 '20 21:05 adityabandla

Hi Aditya,

This error seems to occur after the graftm package has been created and when its being tested, and only when the suffix .txt is used for the input sequences. When renamed to have the fasta suffix .fna the graftm create command runs fine using version 0.13.1. Please give this a go and let me know if you run into any errors:

graftM create --sequences rpsB_bac_arc.fna --taxonomy rpsB_bac_arc_taxonomy.txt

In the meantime, this isnt a very helpful error message so I'll make some changes so a more informative error is given in the future.

geronimp avatar May 14 '20 23:05 geronimp