graftM
graftM copied to clipboard
graftM create error
I am trying to build a graftM package for the rpsB gene from the GTDB r89 species-level genomes. I downloaded the unaligned bacterial & archaeal marker gene sequences & concatenated them (i.e. TIGR01011.fna & TIGR01012.fna). Modified the taxonomy files to match the contig headers in the sequence fasta.
Next, I attempted to create the packages by
graftM create --sequences rpsB_bac_arc.fna --taxonomy rpsB_bac_arc_taxonomy.tsv --threads 24
But I ran into this error
Traceback (most recent call last):
File "/home/projects/11001755/miniconda3/envs/singlem/bin/graftM", line 415, in <module>
Run(args).main()
File "/home/projects/11001755/miniconda3/envs/singlem/lib/python3.6/site-packages/graftm/run.py", line 681, in main
threads = self.args.threads
File "/home/projects/11001755/miniconda3/envs/singlem/lib/python3.6/site-packages/graftm/create.py", line 636, in main
align_hmm, output_alignment, threads)
File "/home/projects/11001755/miniconda3/envs/singlem/lib/python3.6/site-packages/graftm/create.py", line 454, in _align_and_create_hmm
output_alignment)
File "/home/projects/11001755/miniconda3/envs/singlem/lib/python3.6/site-packages/graftm/create.py", line 143, in _get_hmm_from_alignment
output = extern.run(cmd)
File "/home/projects/11001755/miniconda3/envs/singlem/lib/python3.6/site-packages/extern/__init__.py", line 41, in run
raise ExternCalledProcessError(process, command)
extern.ExternCalledProcessError: Command hmmbuild -O /dev/stdout -o /dev/stderr '/var/tmp/pbs.71118.wlm01/graftmbcgozapa_align.hmm' '/var/tmp/pbs.71118.wlm01/graftm3k540rg3.aln.faa' returned non-zero exit status 7.
STDERR was: b'Alignment input parse error:\n sequence GCF_002135345.1 has alen 5348; expected 25270\n while reading aligned FASTA file /var/tmp/pbs.71118.wlm01/graftm3k540rg3.aln.faa\n at or near line 7171633\n# hmmbuild :: profile HMM construction from multiple sequence alignments\n# HMMER 3.3 (Nov 2019); http://hmmer.org/\n# Copyright (C) 2019 Howard Hughes Medical Institute.\n# Freely distributed under the BSD open source license.\n# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\n# input alignment file: /var/tmp/pbs.71118.wlm01/graftm3k540rg3.aln.faa\n# output HMM file: /var/tmp/pbs.71118.wlm01/graftmbcgozapa_align.hmm\n# output directed to file: /dev/stderr\n# processed alignment resaved to: /dev/stdout\n# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\n\n# idx name nseq alen mlen W eff_nseq re/pos description\n#---- -------------------- ----- ----- ----- ----- -------- ------ -----------\n'STDOUT was: b''
Hi,
Doesn't sound like you are doing anything wrong on first look, but can you make the input files available? I'm on leave and won't be able to get to this for 2 months or so. @geronimp ?
From: Aditya Bandla [email protected] Sent: Tuesday, May 12, 2020 6:12:41 AM To: geronimp/graftM [email protected] Cc: Subscribed [email protected] Subject: [geronimp/graftM] graftM create error (#265)
I am trying to build a graftM package for the rpsB gene from the GTDB r89 species-level genomes. I downloaded the unaligned bacterial & archaeal marker gene sequences & concatenated them (i.e. TIGR01011.fna & TIGR01012.fna). Modified the taxonomy files to match the contig headers in the sequence fasta.
Next, I attempted to create the packages by
graftM create --sequences rpsB_bac_arc.fna --taxonomy rpsB_bac_arc_taxonomy.tsv --threads 24
But I ran into this error
Traceback (most recent call last):
File "/home/projects/11001755/miniconda3/envs/singlem/bin/graftM", line 415, in
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/geronimp/graftM/issues/265, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAADX5COUOOIOKX6UBGOBEDRRBL3TANCNFSM4M6G6JCQ.
Hey, happy to take this on - please provide the taxonomy and sequences file and ill take a look
Here are the files rpsB_bac_arc.txt rpsB_bac_arc_taxonomy.txt
@geronimp I'm trying to create this package for a ribosomal protein, and I'm providing the trimmed protein MSA using the --alignment flag. In this case, does graftM expect the --sequences to be nucleotides or proteins?
Seems to work when I use the aa sequences, but not nt
Hi Aditya,
This error seems to occur after the graftm package has been created and when its being tested, and only when the suffix .txt is used for the input sequences. When renamed to have the fasta suffix .fna the graftm create command runs fine using version 0.13.1. Please give this a go and let me know if you run into any errors:
graftM create --sequences rpsB_bac_arc.fna --taxonomy rpsB_bac_arc_taxonomy.txt
In the meantime, this isnt a very helpful error message so I'll make some changes so a more informative error is given in the future.