GTDBTk icon indicating copy to clipboard operation
GTDBTk copied to clipboard

Problem with empty files (such as produced by metabat)

Open cmfield opened this issue 1 year ago • 1 comments

Metabat2 produces a few files during binning that can be empty, such as:
sample.lowDepth.fa
sample.tooShort.fa

GTDB-Tk thus has problems processing the folder containing the binned .fa files, because Mash produces an error for empty files. Should be an easy fix to remove size zero files from the list of files to process I hope - it would save me a fix to my pipeline anyway.

GTDB-Tk log:

[2023-08-02 09:44:22] INFO: GTDB-Tk v2.3.0
[2023-08-02 09:44:22] INFO: gtdbtk classify_wf --genome_dir scratch/takada/metabat/ -x fa --out_dir scratch/takada/annotation/ --cpus 32 --prefix takada --mash_db scratch/takada/annotation/takada.msh
[2023-08-02 09:44:22] INFO: Using GTDB-Tk reference data version r214: /nfs/nas22/fs2201/biol_micro_unix_modules/modules/software/GTDB-Tk/2.3.0-foss-2020b/data
[2023-08-02 09:44:22] INFO: Loading reference genomes.
[2023-08-02 09:44:22] INFO: Using Mash version 2.3
[2023-08-02 09:44:22] INFO: Creating Mash sketch file: scratch/takada/annotation/classify/ani_screen/intermediate_results/mash/takada.user_query_sketch.msh
[2023-08-02 09:44:22] INFO: Completed 2 genomes in 0.01 seconds (195.71 genomes/second).
[2023-08-02 09:44:22] ERROR: Error generating Mash sketch:
[2023-08-02 09:44:22] ERROR: Controlled exit resulting from an unrecoverable error or warning.

Mash log (edited) for command mash sketch -l -p 32 <(ls scratch/takada/metabat/*fa) -o scratch/takada/annotation/takada.msh -k 16 -s 5000:

<lots of files that work>
ERROR: Did not find fasta records in "input files".

cmfield avatar Aug 02 '23 07:08 cmfield

Hello, Thanks for your feedback, We will add a test to disregard any empty genome files. This will be available in the next Tk release

pchaumeil avatar Aug 04 '23 03:08 pchaumeil