GTDBTk
GTDBTk copied to clipboard
Problem with empty files (such as produced by metabat)
Metabat2 produces a few files during binning that can be empty, such as:
sample.lowDepth.fa
sample.tooShort.fa
GTDB-Tk thus has problems processing the folder containing the binned .fa files, because Mash produces an error for empty files. Should be an easy fix to remove size zero files from the list of files to process I hope - it would save me a fix to my pipeline anyway.
GTDB-Tk log:
[2023-08-02 09:44:22] INFO: GTDB-Tk v2.3.0
[2023-08-02 09:44:22] INFO: gtdbtk classify_wf --genome_dir scratch/takada/metabat/ -x fa --out_dir scratch/takada/annotation/ --cpus 32 --prefix takada --mash_db scratch/takada/annotation/takada.msh
[2023-08-02 09:44:22] INFO: Using GTDB-Tk reference data version r214: /nfs/nas22/fs2201/biol_micro_unix_modules/modules/software/GTDB-Tk/2.3.0-foss-2020b/data
[2023-08-02 09:44:22] INFO: Loading reference genomes.
[2023-08-02 09:44:22] INFO: Using Mash version 2.3
[2023-08-02 09:44:22] INFO: Creating Mash sketch file: scratch/takada/annotation/classify/ani_screen/intermediate_results/mash/takada.user_query_sketch.msh
[2023-08-02 09:44:22] INFO: Completed 2 genomes in 0.01 seconds (195.71 genomes/second).
[2023-08-02 09:44:22] ERROR: Error generating Mash sketch:
[2023-08-02 09:44:22] ERROR: Controlled exit resulting from an unrecoverable error or warning.
Mash log (edited) for command mash sketch -l -p 32 <(ls scratch/takada/metabat/*fa) -o scratch/takada/annotation/takada.msh -k 16 -s 5000
:
<lots of files that work>
ERROR: Did not find fasta records in "input files".
Hello, Thanks for your feedback, We will add a test to disregard any empty genome files. This will be available in the next Tk release