SemiBin
SemiBin copied to clipboard
Database Download & Documentation
Gday @psj1997 @luispedro and other Semibin developers,
Firstly thanks for Semibin(2) - it works amazingly well, so many bins recovered compared to other binning methods :)
I want to share some feedback regarding database download and Semibin's documentation.
The HPC cluster I use at my institution blocks internet access on compute nodes. Therefore, lazily downloading the Semibin2 database did not work when I ran the below command (Semibin v1.5.1, Linux installation via bioconda).
SemiBin2 multi_easy_bin -i {input.catalogue} -b {input.bams} -o {params.outdir} -s {params.separator} --minfasta-kbs {params.minfasta}
It was difficult for me to figure out that this was in fact the error, because a database isn't mentioned in the readme and only in the FAQs of the docs, and the error message wasn't informative (apologies I have overwritten the log file or I would quote it).
I then tried following the FAQs of the docs to download the updated GTDB database, the following does not work in MMseqs2 v13.45111 (with this known MMSeqs2 error https://github.com/soedinglab/MMseqs2/issues/561)
mmseqs databases GTDB GTDB tmp
Then, after looking at the Semibin codebase I was able to install the database manually:
wget 'https://zenodo.org/record/4751564/files/GTDB_v95.tar.gz?download=1'
mv GTDB_v95.tar.gz?download=1 GTDB_v95.tar.gz
tar -xzvf GTDB_v95.tar.gz
and went from there, specifying -r {params.db}
and then semibin worked perfectly.
So perhaps either including a specific --download_database
flag or script, or just documenting a manual install method would help future users like me without compute node internet access.
George