mag
mag copied to clipboard
Cannot reach https://busco-data.ezlab.org/v5/data/file_versions.tsv
Description of the bug
Hello, When I start nf-core-mag it runs for some time and then stops with
ERROR: BUSCO analysis failed for some unknown reason! See also MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err.
See the attached error log. Busco already has an fixed issue with this problem (https://gitlab.com/ezlab/busco/-/issues/567). That's why I post it here first. Tell me to go away if you think they should reopen this issue.
I tried to access https://busco-data.ezlab.org/v5/data/file_versions.tsv with wget and curl and had no problem downloading it from the machine this runs on. Therefore I don't think this is a firewall issue of some sort, but I could be wrong. After all I don't know the exact method how busco is trying this.
I also thought at first that this might be just an internet hickup. So I resumed the analysis after testing whether I could download this file. This was not the case, this will occur every time I resume.
Thanks for your help
Christoph
Command used and terminal output
nextflow run nf-core/mag -profile conda --input '../data/*_R{1,2}.fastq.gz' --outdir results -r fix-convert-depths-gzip -resume
N E X T F L O W ~ version 22.04.5
Launching `https://github.com/nf-core/mag` [elated_stonebraker] DSL2 - revision: 1b4456d542 [fix-convert-depths-gzip]
------------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~'
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/mag v2.3.0dev
------------------------------------------------------
Core Nextflow options
revision : fix-convert-depths-gzip
runName : elated_stonebraker
launchDir : /media/NGS/nf-core-workflow
workDir : /media/NGS/nf-core-workflow/work
projectDir : /home/hummelchen/.nextflow/assets/nf-core/mag
userName : hummelchen
profile : conda
configFiles : /home/hummelchen/.nextflow/assets/nf-core/mag/nextflow.config
Input/output options
input : ../data/*_R{1,2}.fastq.gz
outdir : results
Generic options
enable_conda : true
Quality control for short reads options
phix_reference : /home/hummelchen/.nextflow/assets/nf-core/mag/assets/data/GCA_002596845.1_ASM259684v1_genomic.fna.gz
Quality control for long reads options
lambda_reference: /home/hummelchen/.nextflow/assets/nf-core/mag/assets/data/GCA_000840245.1_ViralProj14204_genomic.fna.gz
Taxonomic profiling options
gtdb : https://data.ace.uq.edu.au/public/gtdb/data/releases/release202/202.0/auxillary_files/gtdbtk_r202_data.tar.gz
!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/mag for your analysis please cite:
* The pipeline publication
https://doi.org/10.1093/nargab/lqac007
* The pipeline
https://doi.org/10.5281/zenodo.3589527
* The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
* Software dependencies
https://github.com/nf-core/mag/blob/master/CITATIONS.md
------------------------------------------------------
executor > local (7)
[47/9be65c] process > NFCORE_MAG:MAG:FASTQC_RAW (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[3d/396de6] process > NFCORE_MAG:MAG:FASTP (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[ac/adbb55] process > NFCORE_MAG:MAG:BOWTIE2_PHIX_REMOVAL_BUILD (GCA_002596845.1_ASM259684v1_genomic.fna.gz) [100%] 1 of 1, cached: 1 ✔
[16/88f95a] process > NFCORE_MAG:MAG:BOWTIE2_PHIX_REMOVAL_ALIGN (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[9b/ec6fb9] process > NFCORE_MAG:MAG:FASTQC_TRIMMED (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[- ] process > NFCORE_MAG:MAG:NANOPLOT_RAW -
[- ] process > NFCORE_MAG:MAG:PORECHOP -
[- ] process > NFCORE_MAG:MAG:NANOLYSE -
[- ] process > NFCORE_MAG:MAG:FILTLONG -
[- ] process > NFCORE_MAG:MAG:NANOPLOT_FILTERED -
[- ] process > NFCORE_MAG:MAG:CENTRIFUGE_DB_PREPARATION -
[- ] process > NFCORE_MAG:MAG:CENTRIFUGE -
[- ] process > NFCORE_MAG:MAG:KRAKEN2_DB_PREPARATION -
[- ] process > NFCORE_MAG:MAG:KRAKEN2 -
[37/8a2ffc] process > NFCORE_MAG:MAG:MEGAHIT (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[8a/bf0dd1] process > NFCORE_MAG:MAG:SPADES (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[- ] process > NFCORE_MAG:MAG:SPADESHYBRID -
[3c/1903eb] process > NFCORE_MAG:MAG:QUAST (MEGAHIT-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[6b/450699] process > NFCORE_MAG:MAG:PRODIGAL (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[bd/0fff10] process > NFCORE_MAG:MAG:BINNING_PREPARATION:BOWTIE2_ASSEMBLY_BUILD (MEGAHIT-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[ff/266e2f] process > NFCORE_MAG:MAG:BINNING_PREPARATION:BOWTIE2_ASSEMBLY_ALIGN (MEGAHIT-NG-30689_QN1_4_3_lib613328_10075_2-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[cd/528041] process > NFCORE_MAG:MAG:BINNING:METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[e7/d37f31] process > NFCORE_MAG:MAG:BINNING:CONVERT_DEPTHS (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[87/0a8ee1] process > NFCORE_MAG:MAG:BINNING:METABAT2_METABAT2 (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[54/c0b9eb] process > NFCORE_MAG:MAG:BINNING:MAXBIN2 (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[2f/482f33] process > NFCORE_MAG:MAG:BINNING:ADJUST_MAXBIN2_EXT (MEGAHIT-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[f9/a7820f] process > NFCORE_MAG:MAG:BINNING:SPLIT_FASTA (MEGAHIT-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 4 of 4, cached: 4 ✔
[af/95fa5c] process > NFCORE_MAG:MAG:BINNING:GUNZIP_BINS (MEGAHIT-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2.023.fa.gz) [100%] 106 of 106, cached: 106 ✔
[- ] process > NFCORE_MAG:MAG:BINNING:GUNZIP_UNBINS -
[b3/6f4c47] process > NFCORE_MAG:MAG:BINNING:MAG_DEPTHS (MEGAHIT-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 4 of 4, cached: 4 ✔
[- ] process > NFCORE_MAG:MAG:BINNING:MAG_DEPTHS_PLOT -
[2f/5d9547] process > NFCORE_MAG:MAG:BINNING:MAG_DEPTHS_SUMMARY [100%] 1 of 1, cached: 1 ✔
[0c/ee7390] process > NFCORE_MAG:MAG:BUSCO_QC:BUSCO (MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.13.fa) [ 0%] 0 of 106
[- ] process > NFCORE_MAG:MAG:BUSCO_QC:BUSCO_PLOT -
[- ] process > NFCORE_MAG:MAG:BUSCO_QC:BUSCO_SUMMARY -
[1f/0f9fa9] process > NFCORE_MAG:MAG:QUAST_BINS (SPAdes-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 4 of 4, cached: 4 ✔
[04/96c715] process > NFCORE_MAG:MAG:QUAST_BINS_SUMMARY [100%] 1 of 1, cached: 1 ✔
[- ] process > NFCORE_MAG:MAG:CAT -
[d3/503ebe] process > NFCORE_MAG:MAG:GTDBTK:GTDBTK_DB_PREPARATION (gtdbtk_r202_data.tar.gz) [100%] 1 of 1, cached: 1 ✔
[- ] process > NFCORE_MAG:MAG:GTDBTK:GTDBTK_CLASSIFY -
[- ] process > NFCORE_MAG:MAG:GTDBTK:GTDBTK_SUMMARY -
[- ] process > NFCORE_MAG:MAG:BIN_SUMMARY -
[37/3b84e6] process > NFCORE_MAG:MAG:PROKKA (MEGAHIT-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2.017) [ 94%] 100 of 106, cached: 100
[- ] process > NFCORE_MAG:MAG:CUSTOM_DUMPSOFTWAREVERSIONS -
[- ] process > NFCORE_MAG:MAG:MULTIQC -
Error executing process > 'NFCORE_MAG:MAG:BUSCO_QC:BUSCO (MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa)'
Caused by:
Process `NFCORE_MAG:MAG:BUSCO_QC:BUSCO (MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa)` terminated with an error exit status (1)
Command executed:
# ensure augustus has write access to config directory
if [ N = "Y" ] ; then
cp -r /usr/local/config/ augustus_config/
export AUGUSTUS_CONFIG_PATH=augustus_config
fi
# place db in extra folder to ensure BUSCO recognizes it as path (instead of downloading it)
if [ N = "Y" ] ; then
mkdir dataset
mv dataset/
fi
# set nullgob: if pattern matches no files, expand to a null string rather than to itself
shopt -s nullglob
# only used for saving busco downloads
most_spec_db="NA"
if busco --auto-lineage --mode genome --in MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa --cpu "8" --out "BUSCO" > MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log 2> MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err; then
# get name of used specific lineage dataset
summaries=(BUSCO/short_summary.specific.*.BUSCO.txt)
if [ ${#summaries[@]} -ne 1 ]; then
echo "ERROR: none or multiple 'BUSCO/short_summary.specific.*.BUSCO.txt' files found. Expected one."
exit 1
fi
[[ $summaries =~ BUSCO/short_summary.specific.(.*).BUSCO.txt ]];
db_name_spec="${BASH_REMATCH[1]}"
most_spec_db=${db_name_spec}
echo "Used specific lineage dataset: ${db_name_spec}"
if [ N = "Y" ]; then
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.specific_lineage.${db_name_spec}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
# if lineage dataset is provided, BUSCO analysis does not fail in case no genes can be found as when using the auto selection setting
# report bin as failed to allow consistent warnings within the pipeline for both settings
if egrep -q $'WARNING: BUSCO did not find any match.' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log ; then
echo "WARNING: BUSCO could not find any genes for the provided lineage dataset! See also MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log."
echo -e "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa No genes" > "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.failed_bin.txt"
fi
else
# auto lineage selection
if { egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'INFO: Lineage \S+ is selected, supported by ' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log ; } || { egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'INFO: The results from the Prodigal gene predictor indicate that your data belongs to the mollicutes clade. Testing subclades...' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'INFO: Using local lineages directory ' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log ; }; then
# the second statement is necessary, because certain mollicute clades use a different genetic code, are not part of the BUSCO placement tree, are tested separately
# and cause different log messages
echo "Domain and specific lineage could be selected by BUSCO."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.specific_lineage.${db_name_spec}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
db_name_gen=""
summaries_gen=(BUSCO/short_summary.generic.*.BUSCO.txt)
if [ ${#summaries_gen[@]} -lt 1 ]; then
echo "No 'BUSCO/short_summary.generic.*.BUSCO.txt' file found. Assuming selected domain and specific lineages are the same."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.domain.${db_name_spec}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
db_name_gen=${db_name_spec}
else
[[ $summaries_gen =~ BUSCO/short_summary.generic.(.*).BUSCO.txt ]];
db_name_gen="${BASH_REMATCH[1]}"
echo "Used generic lineage dataset: ${db_name_gen}"
cp BUSCO/short_summary.generic.${db_name_gen}.BUSCO.txt short_summary.domain.${db_name_gen}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
fi
for f in BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*faa; do
cat BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*faa | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_gen}.faa.gz
break
done
for f in BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*fna; do
cat BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*fna | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_gen}.fna.gz
break
done
elif egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'INFO: Not enough markers were placed on the tree \([0-9]*\). Root lineage \S+ is kept' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log ; then
echo "Domain could be selected by BUSCO, but no more specific lineage."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.domain.${db_name_spec}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
elif egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'INFO: Running virus detection pipeline' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log ; then
# TODO double-check if selected dataset is not one of bacteria_*, archaea_*, eukaryota_*?
echo "Domain could not be selected by BUSCO, but virus dataset was selected."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.specific_lineage.${db_name_spec}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
else
echo "ERROR: Some not expected case occurred! See MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log." >&2
exit 1
fi
fi
for f in BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*faa; do
cat BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*faa | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_spec}.faa.gz
break
done
for f in BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*fna; do
cat BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*fna | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_spec}.fna.gz
break
done
elif egrep -q $'ERROR: No genes were recognized by BUSCO' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err ; then
echo "WARNING: BUSCO analysis failed due to no recognized genes! See also MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err."
echo -e "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa No genes" > "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.failed_bin.txt"
elif egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'ERROR: Placements failed' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err ; then
executor > local (7)
[47/9be65c] process > NFCORE_MAG:MAG:FASTQC_RAW (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[3d/396de6] process > NFCORE_MAG:MAG:FASTP (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[ac/adbb55] process > NFCORE_MAG:MAG:BOWTIE2_PHIX_REMOVAL_BUILD (GCA_002596845.1_ASM259684v1_genomic.fna.gz) [100%] 1 of 1, cached: 1 ✔
[16/88f95a] process > NFCORE_MAG:MAG:BOWTIE2_PHIX_REMOVAL_ALIGN (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[9b/ec6fb9] process > NFCORE_MAG:MAG:FASTQC_TRIMMED (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[- ] process > NFCORE_MAG:MAG:NANOPLOT_RAW -
[- ] process > NFCORE_MAG:MAG:PORECHOP -
[- ] process > NFCORE_MAG:MAG:NANOLYSE -
[- ] process > NFCORE_MAG:MAG:FILTLONG -
[- ] process > NFCORE_MAG:MAG:NANOPLOT_FILTERED -
[- ] process > NFCORE_MAG:MAG:CENTRIFUGE_DB_PREPARATION -
[- ] process > NFCORE_MAG:MAG:CENTRIFUGE -
[- ] process > NFCORE_MAG:MAG:KRAKEN2_DB_PREPARATION -
[- ] process > NFCORE_MAG:MAG:KRAKEN2 -
[37/8a2ffc] process > NFCORE_MAG:MAG:MEGAHIT (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[8a/bf0dd1] process > NFCORE_MAG:MAG:SPADES (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[- ] process > NFCORE_MAG:MAG:SPADESHYBRID -
[3c/1903eb] process > NFCORE_MAG:MAG:QUAST (MEGAHIT-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[6b/450699] process > NFCORE_MAG:MAG:PRODIGAL (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[bd/0fff10] process > NFCORE_MAG:MAG:BINNING_PREPARATION:BOWTIE2_ASSEMBLY_BUILD (MEGAHIT-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[ff/266e2f] process > NFCORE_MAG:MAG:BINNING_PREPARATION:BOWTIE2_ASSEMBLY_ALIGN (MEGAHIT-NG-30689_QN1_4_3_lib613328_10075_2-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[cd/528041] process > NFCORE_MAG:MAG:BINNING:METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[e7/d37f31] process > NFCORE_MAG:MAG:BINNING:CONVERT_DEPTHS (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[87/0a8ee1] process > NFCORE_MAG:MAG:BINNING:METABAT2_METABAT2 (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[54/c0b9eb] process > NFCORE_MAG:MAG:BINNING:MAXBIN2 (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[2f/482f33] process > NFCORE_MAG:MAG:BINNING:ADJUST_MAXBIN2_EXT (MEGAHIT-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[f9/a7820f] process > NFCORE_MAG:MAG:BINNING:SPLIT_FASTA (MEGAHIT-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 4 of 4, cached: 4 ✔
[af/95fa5c] process > NFCORE_MAG:MAG:BINNING:GUNZIP_BINS (MEGAHIT-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2.023.fa.gz) [100%] 106 of 106, cached: 106 ✔
[- ] process > NFCORE_MAG:MAG:BINNING:GUNZIP_UNBINS -
[b3/6f4c47] process > NFCORE_MAG:MAG:BINNING:MAG_DEPTHS (MEGAHIT-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 4 of 4, cached: 4 ✔
[- ] process > NFCORE_MAG:MAG:BINNING:MAG_DEPTHS_PLOT -
[2f/5d9547] process > NFCORE_MAG:MAG:BINNING:MAG_DEPTHS_SUMMARY [100%] 1 of 1, cached: 1 ✔
[c2/76795c] process > NFCORE_MAG:MAG:BUSCO_QC:BUSCO (MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa) [ 0%] 1 of 106, failed: 1
[- ] process > NFCORE_MAG:MAG:BUSCO_QC:BUSCO_PLOT -
[- ] process > NFCORE_MAG:MAG:BUSCO_QC:BUSCO_SUMMARY -
[1f/0f9fa9] process > NFCORE_MAG:MAG:QUAST_BINS (SPAdes-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 4 of 4, cached: 4 ✔
[04/96c715] process > NFCORE_MAG:MAG:QUAST_BINS_SUMMARY [100%] 1 of 1, cached: 1 ✔
[- ] process > NFCORE_MAG:MAG:CAT -
[d3/503ebe] process > NFCORE_MAG:MAG:GTDBTK:GTDBTK_DB_PREPARATION (gtdbtk_r202_data.tar.gz) [100%] 1 of 1, cached: 1 ✔
[- ] process > NFCORE_MAG:MAG:GTDBTK:GTDBTK_CLASSIFY -
[- ] process > NFCORE_MAG:MAG:GTDBTK:GTDBTK_SUMMARY -
[- ] process > NFCORE_MAG:MAG:BIN_SUMMARY -
[37/3b84e6] process > NFCORE_MAG:MAG:PROKKA (MEGAHIT-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2.017) [ 94%] 100 of 106, cached: 100
[- ] process > NFCORE_MAG:MAG:CUSTOM_DUMPSOFTWAREVERSIONS -
[- ] process > NFCORE_MAG:MAG:MULTIQC -
Execution cancelled -- Finishing pending tasks before exit
Error executing process > 'NFCORE_MAG:MAG:BUSCO_QC:BUSCO (MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa)'
Caused by:
Process `NFCORE_MAG:MAG:BUSCO_QC:BUSCO (MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa)` terminated with an error exit status (1)
Command executed:
# ensure augustus has write access to config directory
if [ N = "Y" ] ; then
cp -r /usr/local/config/ augustus_config/
export AUGUSTUS_CONFIG_PATH=augustus_config
fi
# place db in extra folder to ensure BUSCO recognizes it as path (instead of downloading it)
if [ N = "Y" ] ; then
mkdir dataset
mv dataset/
fi
# set nullgob: if pattern matches no files, expand to a null string rather than to itself
shopt -s nullglob
# only used for saving busco downloads
most_spec_db="NA"
if busco --auto-lineage --mode genome --in MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa --cpu "8" --out "BUSCO" > MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log 2> MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err; then
# get name of used specific lineage dataset
summaries=(BUSCO/short_summary.specific.*.BUSCO.txt)
if [ ${#summaries[@]} -ne 1 ]; then
echo "ERROR: none or multiple 'BUSCO/short_summary.specific.*.BUSCO.txt' files found. Expected one."
exit 1
fi
[[ $summaries =~ BUSCO/short_summary.specific.(.*).BUSCO.txt ]];
db_name_spec="${BASH_REMATCH[1]}"
most_spec_db=${db_name_spec}
echo "Used specific lineage dataset: ${db_name_spec}"
if [ N = "Y" ]; then
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.specific_lineage.${db_name_spec}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
# if lineage dataset is provided, BUSCO analysis does not fail in case no genes can be found as when using the auto selection setting
# report bin as failed to allow consistent warnings within the pipeline for both settings
if egrep -q $'WARNING: BUSCO did not find any match.' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log ; then
echo "WARNING: BUSCO could not find any genes for the provided lineage dataset! See also MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log."
echo -e "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa No genes" > "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.failed_bin.txt"
fi
else
# auto lineage selection
if { egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'INFO: Lineage \S+ is selected, supported by ' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log ; } || { egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'INFO: The results from the Prodigal gene predictor indicate that your data belongs to the mollicutes clade. Testing subclades...' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'INFO: Using local lineages directory ' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log ; }; then
# the second statement is necessary, because certain mollicute clades use a different genetic code, are not part of the BUSCO placement tree, are tested separately
# and cause different log messages
echo "Domain and specific lineage could be selected by BUSCO."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.specific_lineage.${db_name_spec}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
db_name_gen=""
summaries_gen=(BUSCO/short_summary.generic.*.BUSCO.txt)
if [ ${#summaries_gen[@]} -lt 1 ]; then
echo "No 'BUSCO/short_summary.generic.*.BUSCO.txt' file found. Assuming selected domain and specific lineages are the same."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.domain.${db_name_spec}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
db_name_gen=${db_name_spec}
else
[[ $summaries_gen =~ BUSCO/short_summary.generic.(.*).BUSCO.txt ]];
db_name_gen="${BASH_REMATCH[1]}"
echo "Used generic lineage dataset: ${db_name_gen}"
cp BUSCO/short_summary.generic.${db_name_gen}.BUSCO.txt short_summary.domain.${db_name_gen}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
fi
for f in BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*faa; do
cat BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*faa | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_gen}.faa.gz
break
done
for f in BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*fna; do
cat BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*fna | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_gen}.fna.gz
break
done
elif egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'INFO: Not enough markers were placed on the tree \([0-9]*\). Root lineage \S+ is kept' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log ; then
echo "Domain could be selected by BUSCO, but no more specific lineage."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.domain.${db_name_spec}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
elif egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'INFO: Running virus detection pipeline' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log ; then
# TODO double-check if selected dataset is not one of bacteria_*, archaea_*, eukaryota_*?
echo "Domain could not be selected by BUSCO, but virus dataset was selected."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.specific_lineage.${db_name_spec}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
else
echo "ERROR: Some not expected case occurred! See MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log." >&2
exit 1
fi
fi
for f in BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*faa; do
cat BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*faa | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_spec}.faa.gz
break
done
for f in BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*fna; do
cat BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*fna | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_spec}.fna.gz
break
done
elif egrep -q $'ERROR: No genes were recognized by BUSCO' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err ; then
echo "WARNING: BUSCO analysis failed due to no recognized genes! See also MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err."
echo -e "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa No genes" > "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.failed_bin.txt"
elif egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'ERROR: Placements failed' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err ; then
echo "WARNING: BUSCO analysis failed due to failed placements! See also MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err. Still using results for selected generic lineage dataset."
echo -e "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa Placements failed" > "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.failed_bin.txt"
message=$(egrep $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log)
[[ $message =~ INFO:[[:space:]]([_[:alnum:]]+)[[:space:]]selected ]];
db_name_gen="${BASH_REMATCH[1]}"
most_spec_db=${db_name_gen}
echo "Used generic lineage dataset: ${db_name_gen}"
cp BUSCO/auto_lineage/run_${db_name_gen}/short_summary.txt short_summary.domain.${db_name_gen}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
for f in BUSCO/auto_lineage/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*faa; do
cat BUSCO/auto_lineage/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*faa | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_gen}.faa.gz
break
done
for f in BUSCO/auto_lineage/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*fna; do
cat BUSCO/auto_lineage/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*fna | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_gen}.fna.gz
break
executor > local (7)
[47/9be65c] process > NFCORE_MAG:MAG:FASTQC_RAW (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[3d/396de6] process > NFCORE_MAG:MAG:FASTP (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[ac/adbb55] process > NFCORE_MAG:MAG:BOWTIE2_PHIX_REMOVAL_BUILD (GCA_002596845.1_ASM259684v1_genomic.fna.gz) [100%] 1 of 1, cached: 1 ✔
[16/88f95a] process > NFCORE_MAG:MAG:BOWTIE2_PHIX_REMOVAL_ALIGN (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[9b/ec6fb9] process > NFCORE_MAG:MAG:FASTQC_TRIMMED (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[- ] process > NFCORE_MAG:MAG:NANOPLOT_RAW -
[- ] process > NFCORE_MAG:MAG:PORECHOP -
[- ] process > NFCORE_MAG:MAG:NANOLYSE -
[- ] process > NFCORE_MAG:MAG:FILTLONG -
[- ] process > NFCORE_MAG:MAG:NANOPLOT_FILTERED -
[- ] process > NFCORE_MAG:MAG:CENTRIFUGE_DB_PREPARATION -
[- ] process > NFCORE_MAG:MAG:CENTRIFUGE -
[- ] process > NFCORE_MAG:MAG:KRAKEN2_DB_PREPARATION -
[- ] process > NFCORE_MAG:MAG:KRAKEN2 -
[37/8a2ffc] process > NFCORE_MAG:MAG:MEGAHIT (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[8a/bf0dd1] process > NFCORE_MAG:MAG:SPADES (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 1 of 1, cached: 1 ✔
[- ] process > NFCORE_MAG:MAG:SPADESHYBRID -
[3c/1903eb] process > NFCORE_MAG:MAG:QUAST (MEGAHIT-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[6b/450699] process > NFCORE_MAG:MAG:PRODIGAL (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[bd/0fff10] process > NFCORE_MAG:MAG:BINNING_PREPARATION:BOWTIE2_ASSEMBLY_BUILD (MEGAHIT-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[ff/266e2f] process > NFCORE_MAG:MAG:BINNING_PREPARATION:BOWTIE2_ASSEMBLY_ALIGN (MEGAHIT-NG-30689_QN1_4_3_lib613328_10075_2-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[cd/528041] process > NFCORE_MAG:MAG:BINNING:METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[e7/d37f31] process > NFCORE_MAG:MAG:BINNING:CONVERT_DEPTHS (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[87/0a8ee1] process > NFCORE_MAG:MAG:BINNING:METABAT2_METABAT2 (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[54/c0b9eb] process > NFCORE_MAG:MAG:BINNING:MAXBIN2 (NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[2f/482f33] process > NFCORE_MAG:MAG:BINNING:ADJUST_MAXBIN2_EXT (MEGAHIT-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 2 of 2, cached: 2 ✔
[f9/a7820f] process > NFCORE_MAG:MAG:BINNING:SPLIT_FASTA (MEGAHIT-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 4 of 4, cached: 4 ✔
[af/95fa5c] process > NFCORE_MAG:MAG:BINNING:GUNZIP_BINS (MEGAHIT-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2.023.fa.gz) [100%] 106 of 106, cached: 106 ✔
[- ] process > NFCORE_MAG:MAG:BINNING:GUNZIP_UNBINS -
[b3/6f4c47] process > NFCORE_MAG:MAG:BINNING:MAG_DEPTHS (MEGAHIT-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 4 of 4, cached: 4 ✔
[- ] process > NFCORE_MAG:MAG:BINNING:MAG_DEPTHS_PLOT -
[2f/5d9547] process > NFCORE_MAG:MAG:BINNING:MAG_DEPTHS_SUMMARY [100%] 1 of 1, cached: 1 ✔
[0c/ee7390] process > NFCORE_MAG:MAG:BUSCO_QC:BUSCO (MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.13.fa) [ 1%] 1 of 100, failed: 1
[- ] process > NFCORE_MAG:MAG:BUSCO_QC:BUSCO_PLOT -
[- ] process > NFCORE_MAG:MAG:BUSCO_QC:BUSCO_SUMMARY -
[1f/0f9fa9] process > NFCORE_MAG:MAG:QUAST_BINS (SPAdes-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2) [100%] 4 of 4, cached: 4 ✔
[04/96c715] process > NFCORE_MAG:MAG:QUAST_BINS_SUMMARY [100%] 1 of 1, cached: 1 ✔
[- ] process > NFCORE_MAG:MAG:CAT -
[d3/503ebe] process > NFCORE_MAG:MAG:GTDBTK:GTDBTK_DB_PREPARATION (gtdbtk_r202_data.tar.gz) [100%] 1 of 1, cached: 1 ✔
[- ] process > NFCORE_MAG:MAG:GTDBTK:GTDBTK_CLASSIFY -
[- ] process > NFCORE_MAG:MAG:GTDBTK:GTDBTK_SUMMARY -
[- ] process > NFCORE_MAG:MAG:BIN_SUMMARY -
[37/3b84e6] process > NFCORE_MAG:MAG:PROKKA (MEGAHIT-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2.017) [ 94%] 100 of 106, cached: 100
[- ] process > NFCORE_MAG:MAG:CUSTOM_DUMPSOFTWAREVERSIONS -
[- ] process > NFCORE_MAG:MAG:MULTIQC -
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/mag] Pipeline completed with errors-
WARN: Killing running tasks (6)
Error executing process > 'NFCORE_MAG:MAG:BUSCO_QC:BUSCO (MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa)'
Caused by:
Process `NFCORE_MAG:MAG:BUSCO_QC:BUSCO (MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa)` terminated with an error exit status (1)
Command executed:
# ensure augustus has write access to config directory
if [ N = "Y" ] ; then
cp -r /usr/local/config/ augustus_config/
export AUGUSTUS_CONFIG_PATH=augustus_config
fi
# place db in extra folder to ensure BUSCO recognizes it as path (instead of downloading it)
if [ N = "Y" ] ; then
mkdir dataset
mv dataset/
fi
# set nullgob: if pattern matches no files, expand to a null string rather than to itself
shopt -s nullglob
# only used for saving busco downloads
most_spec_db="NA"
if busco --auto-lineage --mode genome --in MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa --cpu "8" --out "BUSCO" > MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log 2> MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err; then
# get name of used specific lineage dataset
summaries=(BUSCO/short_summary.specific.*.BUSCO.txt)
if [ ${#summaries[@]} -ne 1 ]; then
echo "ERROR: none or multiple 'BUSCO/short_summary.specific.*.BUSCO.txt' files found. Expected one."
exit 1
fi
[[ $summaries =~ BUSCO/short_summary.specific.(.*).BUSCO.txt ]];
db_name_spec="${BASH_REMATCH[1]}"
most_spec_db=${db_name_spec}
echo "Used specific lineage dataset: ${db_name_spec}"
if [ N = "Y" ]; then
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.specific_lineage.${db_name_spec}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
# if lineage dataset is provided, BUSCO analysis does not fail in case no genes can be found as when using the auto selection setting
# report bin as failed to allow consistent warnings within the pipeline for both settings
if egrep -q $'WARNING: BUSCO did not find any match.' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log ; then
echo "WARNING: BUSCO could not find any genes for the provided lineage dataset! See also MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log."
echo -e "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa No genes" > "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.failed_bin.txt"
fi
else
# auto lineage selection
if { egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'INFO: Lineage \S+ is selected, supported by ' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log ; } || { egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'INFO: The results from the Prodigal gene predictor indicate that your data belongs to the mollicutes clade. Testing subclades...' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'INFO: Using local lineages directory ' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log ; }; then
# the second statement is necessary, because certain mollicute clades use a different genetic code, are not part of the BUSCO placement tree, are tested separately
# and cause different log messages
echo "Domain and specific lineage could be selected by BUSCO."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.specific_lineage.${db_name_spec}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
db_name_gen=""
summaries_gen=(BUSCO/short_summary.generic.*.BUSCO.txt)
if [ ${#summaries_gen[@]} -lt 1 ]; then
echo "No 'BUSCO/short_summary.generic.*.BUSCO.txt' file found. Assuming selected domain and specific lineages are the same."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.domain.${db_name_spec}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
db_name_gen=${db_name_spec}
else
[[ $summaries_gen =~ BUSCO/short_summary.generic.(.*).BUSCO.txt ]];
db_name_gen="${BASH_REMATCH[1]}"
echo "Used generic lineage dataset: ${db_name_gen}"
cp BUSCO/short_summary.generic.${db_name_gen}.BUSCO.txt short_summary.domain.${db_name_gen}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
fi
for f in BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*faa; do
cat BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*faa | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_gen}.faa.gz
break
done
for f in BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*fna; do
cat BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*fna | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_gen}.fna.gz
break
done
elif egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'INFO: Not enough markers were placed on the tree \([0-9]*\). Root lineage \S+ is kept' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log ; then
echo "Domain could be selected by BUSCO, but no more specific lineage."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.domain.${db_name_spec}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
elif egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'INFO: Running virus detection pipeline' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log ; then
# TODO double-check if selected dataset is not one of bacteria_*, archaea_*, eukaryota_*?
echo "Domain could not be selected by BUSCO, but virus dataset was selected."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.specific_lineage.${db_name_spec}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
else
echo "ERROR: Some not expected case occurred! See MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log." >&2
exit 1
fi
fi
for f in BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*faa; do
cat BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*faa | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_spec}.faa.gz
break
done
for f in BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*fna; do
cat BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*fna | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_spec}.fna.gz
break
done
elif egrep -q $'ERROR: No genes were recognized by BUSCO' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err ; then
echo "WARNING: BUSCO analysis failed due to no recognized genes! See also MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err."
echo -e "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa No genes" > "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.failed_bin.txt"
elif egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log && egrep -q $'ERROR: Placements failed' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err ; then
echo "WARNING: BUSCO analysis failed due to failed placements! See also MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err. Still using results for selected generic lineage dataset."
echo -e "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa Placements failed" > "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.failed_bin.txt"
message=$(egrep $'INFO: \S+ selected' MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log)
[[ $message =~ INFO:[[:space:]]([_[:alnum:]]+)[[:space:]]selected ]];
db_name_gen="${BASH_REMATCH[1]}"
most_spec_db=${db_name_gen}
echo "Used generic lineage dataset: ${db_name_gen}"
cp BUSCO/auto_lineage/run_${db_name_gen}/short_summary.txt short_summary.domain.${db_name_gen}.MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa.txt
for f in BUSCO/auto_lineage/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*faa; do
cat BUSCO/auto_lineage/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*faa | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_gen}.faa.gz
break
done
for f in BUSCO/auto_lineage/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*fna; do
cat BUSCO/auto_lineage/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*fna | gzip >MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_buscos.${db_name_gen}.fna.gz
break
done
else
echo "ERROR: BUSCO analysis failed for some unknown reason! See also MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err." >&2
exit 1
fi
# additionally output genes predicted with Prodigal (GFF3)
if [ -f BUSCO/logs/prodigal_out.log ]; then
mv BUSCO/logs/prodigal_out.log "MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_prodigal.gff"
fi
cat <<-END_VERSIONS > versions.yml
"NFCORE_MAG:MAG:BUSCO_QC:BUSCO":
python: $(python --version 2>&1 | sed 's/Python //g')
R: $(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//')
busco: $(busco --version 2>&1 | sed 's/BUSCO //g')
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
ERROR: BUSCO analysis failed for some unknown reason! See also MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err.
Work dir:
/media/NGS/nf-core-workflow/work/c2/76795ccd4c946124b7723c02666717
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
Join mismatch for the following entries:
- key=SPAdes-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.19.fa values=
- key=MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.10.fa values=
- key=MEGAHIT-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2.012.fa values=
- key=SPAdes-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.25.fa values=
- key=MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa values=
- key=SPAdes-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2.004.fa values=
- key=MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.9.fa values=
- key=MEGAHIT-MaxBin2-NG-30689_QN1_4_3_lib613328_10075_2.003.fa values=
- key=SPAdes-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.7.fa values=
- key=SPAdes-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.11.fa values=
(more omitted)
Relevant files
MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.log MEGAHIT-MetaBAT2-NG-30689_QN1_4_3_lib613328_10075_2.18.fa_busco.err.txt
System information
N E X T F L O W ~ version 22.04.5 nf-core/mag v2.2.0 Container engine: conda OS: Distributor ID: Debian Description: Debian GNU/Linux 10 (buster) Release: 10 Codename: buster
Hardware: desktop with 128 Gb RAM and 32 cores
This seems bad. Could you additionally try using --busco_reference or --busco_download_path. That would mean having the files locally and therefore omitting any downloading step.
Also, please do not use -r fix-convert-depths-gzip but -r 2.2.1 ;)
I have seen the same error, even when specifying either (--busco_reference "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2020-03-06.tar.gz") or (--busco_download_path "path/to/bacteria_odb10)
I am facing the same issue as well
Also, please do not use
-r fix-convert-depths-gzipbut-r 2.2.1;)
Hi, I was told to use this flags by @jfy133 because of issue #327.
I will try to resume the analysis after I upgraded to the latest versions with the suggested flags and report the results.
As @jboktor I can confirm that using --busco_reference or -busco_download_path does not change the outcome.
Hi, I had a similar problem recently. In my case though it was solvable using -resume multiple times, it only occurred in some BUSCO processes and seemed that the download issue was not reproducible. After a while it worked again, thus I didn't dig deeper. However, I am a bit confused why the same problem occurs when using --busco_download_path, since this is used in combination with the --offline parameter.
I can have a look at this next week again.
@skrakau I think thats because --busco_download_path refers to the directory where the busco lineage files are located. It fails to retrieve https://busco-data.ezlab.org/v5/data/file_versions.tsv, which is not among the lineage files. Please correct me if I'm wrong.
Regards
Hi @ChristophKnapp , yes it refers to the directory containing among others a folder with the lineage files, but this should or could also contain a file_versions.tsv file. The BUSCO user guide says one should download all files from https://busco-data.ezlab.org/v5/data/, which contains a file_versions.tsv file. (Although the example 'valid download folder' doesn't contain this file, but I guess then BUSCO would need to download it. Maybe this would need a bit more documentation for this pipeline.)
The nf-core/mag parameter --busco_download_path causes BUSCO to be run with the BUSCO parameters --offline --download_path <...>, see https://github.com/nf-core/mag/blob/a8e92af70eca59a92b72262e6cdde11e69375801/modules/local/busco.nf#L42
which should prevent BUSCO from trying to download anything. That's why I was confused that it still tries to download the file_versions.tsv file, but if the file is missing it probably makes sense that BUSCO fails.
Remains the question why the download of the file fails, thus talking to the BUSCO developers might be good anyway. If you create an issue, could you link this here? Otherwise I could also do it next week.
Otherwise I could also do it next week.
@skrakau, I would prefer if you would do it. You have more insight in what is going on and understand better on how busco is integrated.
Thank you
Christoph
I opened an issue: https://gitlab.com/ezlab/busco/-/issues/593
Feel free to add further details, in case I forgot something.
Apparently there was a rate limit on the BUSCO server introduced a while ago, which probably caused problems in particular when multiple BUSCO processes were running in parallel and which explains why wget works without problems. This rate limit will be increased. We need to check if this will be sufficient for now. So @ChristophKnapp and @nayeimkhan, let us know if this helps.
Independently of this, we should update BUSCO to version 5.4.x at some point, which contains a failsafe mechanism that reattempts a connection in case of failure.
hi @skrakau , the fix works. Thanks!
FYI (maybe that will help someone with similar issue):
I ran into same problem. I am running the pipeline with AWS Batch. I tried --busco_download_path pointing to the local folder with manually unpacked data (as instructed) and for some reason pipeline freeze (with no error, just dead) showing inactive busco process:
process > NFCORE_MAG:MAG:METABAT2_BINNING:MAG_DEPTHS_SUMMARY [100%] 1 of 1, cached: 1✔
process > NFCORE_MAG:MAG:BUSCO_QC:BUSCO (SPAdes-B220601001.49.fa) -
What helped in my case was combination of both:
- changing container to
quay.io/biocontainers/busco:5.4.3--pyhdfd78af_0(inbusco.nf) - providing reference
--busco_reference "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2020-03-06.tar.gz"
I will close this issue, as the original download issue due to the rate limit was fixed. Feel free to open a new issue if similar issues occur again.
@bmlab-sg if your issue remains or re-occurs, please open as well a new separate issue.
Hi @skrakau and @jfy133
I've just run into this old issue now, with version 2.5.4 of the pipeline. My nf-core/mag command specifies the BUSCO DB as such:
--busco_db https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz
It's probably a similar issue with multiple BUSCO jobs attempting to access the URL, and their server blocking new connections after a while:
[4d/3595c3] process > NFCORE_MAG:MAG:BUSCO_QC:BUSCO (MEGAHIT-MetaBAT2-SRR16971107.64.fa) [ 26%] 551 of 2078, failed: 1
I guess the only solution here is to download the database manually I guess :/, and pass that to the pipeline instead
Weirdly enough, I tried this now and STILL get the same error. To be more specific, I downloaded the archive with wget and I am running nf-core/mag with the options --busco_db bacteria_odb10.2024-01-08.tar.gz and resume. I get the following in the output:
[f3/5ae729] process > NFCORE_MAG:MAG:BUSCO_QC:BUSCO (MEGAHIT-MetaBAT2-SRR16971103.67.fa) [ 4%] 103 of 2078, failed: 4
[- ] process > NFCORE_MAG:MAG:BUSCO_QC:BUSCO_SUMMARY -
[8c/69840e] process > NFCORE_MAG:MAG:QUAST_BINS (MEGAHIT-MetaBAT2-unclassified-unrefined-SRR16971104) [100%] 7 of 7 ✔
[- ] process > NFCORE_MAG:MAG:QUAST_BINS_SUMMARY -
[- ] process > NFCORE_MAG:MAG:CAT -
[- ] process > NFCORE_MAG:MAG:CAT_SUMMARY -
[- ] process > NFCORE_MAG:MAG:GTDBTK:GTDBTK_CLASSIFYWF -
[- ] process > NFCORE_MAG:MAG:GTDBTK:GTDBTK_SUMMARY -
[- ] process > NFCORE_MAG:MAG:BIN_SUMMARY -
[3b/f2dd56] process > NFCORE_MAG:MAG:PROKKA (MEGAHIT-MetaBAT2-SRR16971104.441) [ 99%] 2076 of 2078, cached: 701
[- ] process > NFCORE_MAG:MAG:CUSTOM_DUMPSOFTWAREVERSIONS -
[- ] process > NFCORE_MAG:MAG:MULTIQC -
ERROR ~ Error executing process > 'NFCORE_MAG:MAG:BUSCO_QC:BUSCO (MEGAHIT-MetaBAT2-SRR16971103.34.fa)'
Caused by:
Process `NFCORE_MAG:MAG:BUSCO_QC:BUSCO (MEGAHIT-MetaBAT2-SRR16971103.34.fa)` terminated with an error exit status (1)
Command executed:
run_busco.sh "--lineage_dataset dataset/bacteria_odb10" "Y" "bacteria_odb10" "MEGAHIT-MetaBAT2-SRR16971103.34.fa" 8 "Y" "N"
most_spec_db=$(<info_most_spec_db.txt)
cat <<-END_VERSIONS > versions.yml
"NFCORE_MAG:MAG:BUSCO_QC:BUSCO":
python: $(python --version 2>&1 | sed 's/Python //g')
R: $(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//')
busco: $(busco --version 2>&1 | sed 's/BUSCO //g')
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
ERROR: BUSCO analysis failed for some unknown reason! See also MEGAHIT-MetaBAT2-SRR16971103.34.fa_busco.err.
Work dir:
/data/share/horia-banciu/work/f3/602e4961f0a35ee674d094dc7b6626
And this is the contents of MEGAHIT-MetaBAT2-SRR16971103.34.fa_busco.err:
2024-05-28 05:14:55 ERROR: Cannot reach https://busco-data2.ezlab.org/v5/data/file_versions.tsv
2024-05-28 05:14:55 ERROR: BUSCO analysis failed!
2024-05-28 05:14:55 ERROR: Check the logs, read the user guide (https://busco.ezlab.org/busco_userguide.html), and check the BUSCO issue board on https://gitlab.com/ezlab/busco/issues
Why is BUSCO still trying to access https://busco-data2.ezlab.org/v5/data/file_versions.tsv, when I'm running the pipeline with a local database?
I'm attaching the full log file, in case it helps: nextflow-busco-url-error.log.txt
Ugh that looks bad... Maybe it always does an internet look up?
I've not actually used busco Manually myself... @skrakau if you remember, do you have any ideas?
Facing the exact same issue, currently testing the --offline flag to see if we can force it to not do an internet lookup.
Please let me know if it works @b-kolar - I started investigating this yesterday at the airport but couldn't finish before had to fly. Otherwise I'll get back to this on Thursday
I can confirm that the --offline flag works with Busco!
We are testing a modified version of the mag pipeline now, which has so far passed the Busco steps without issues.
Thank you @b-kolar ! I might ping you when my implementation is ready to make sure we added it roughly in the same way, if that's ok ?
@jfy133 No problem, feel free to send any questions my way!
Should be fixed here! @b-kolar could you test -r busco-offline? https://github.com/nf-core/mag/pull/633