mag
mag copied to clipboard
Pipeline fails due to error in "BUSCO_PLOT" process
Hi,
I am running 'nextflow run nf-core/mag -profile singularity' with auto lineage selection.
All runs well until the pipeline is killed at the NFCORE_MAG:MAG:BUSCO_QC:BUSCO_PLOT stage.
It seems there are 2 bins in my dataset (MEGAHIT and a SPAdes bin) for which no Busco ‘placement’ can be made.
Is there a way to somehow skip these 2 problematic bins, or to perhaps force a Busco placement such that the pipeline can proceed to completion? I want to perform the taxonomic classification so I cannot skip this Busco step.
I have looked in the suggested logs but there is no real indication of what can be tweaked to proceed.
Any suggestions would be very much appreciated.
GL
Hi there,
unfortunately I cannot really troubleshoot without the complete error message. Could you please post the error message and/or the nextflow log of that run? I assume what you are reporting is not the cause of the problem.
Hi d4straub,
Thanks for reaching out.
I have attached here the nextflow log of the run as well as the specific Busco error log outputs for the failed placement bin. All of the other bins were successful with the Busco 'short_summary*' outputs produced as they would normally.
Much appreciated,
GL.
MEGAHIT-NS.1754.004.IDT_i7_93---IDT_i5_93.CZS48M.6.fa_busco.log.txt MEGAHIT-NS.1754.004.IDT_i7_93---IDT_i5_93.CZS48M.6.fa_busco.err.txt slurm-27062648.out.txt MEGAHIT-NS.1754.004.IDT_i7_93---IDT_i5_93.CZS48M.6.fa_busco.failed_bin.txt .
MEGAHIT-NS.1754.004.IDT_i7_93---IDT_i5_93.CZS48M.6.fa_busco.log.txt MEGAHIT-NS.1754.004.IDT_i7_93---IDT_i5_93.CZS48M.6.fa_busco.err.txt MEGAHIT-NS.1754.004.IDT_i7_93---IDT_i5_93.CZS48M.6.fa_busco.failed_bin.txt
Those show that BUSCO fails occasionally to find any marker genes. Thats is not nice, but expected in some cases when MAGs/bins are small and very incomplete. The pipeline will handle those cases gracefully by giving you a Warning with the bin names. This is not what makes the pipeline crash.
slurm-27062648.out.txt
That one is the command line output, but not the log file. It does show the complete error message, which is unfortunately not helpful. It appears that indeed BUSCO_plot failed, but the cause is further unknown.
I would like to ask you to send the nextflow log. This log is in the directory in which you started nextflow and is called .nextflow.log. In case you started the pipeline multiple times, the 9 previous logs will be saved as .nextflow.log.<1-9>. I would need the log in which the error occurred. That might help.
Otherwise, to complete the analysis with as much output as you can, first do nextflow run nf-core/mag <your parameters> -resume --skip_busco (which will unfortunately also not allow running gtdbtk). And when this is finished hopefully successful use nextflow run nf-core/mag <your parameters> -resume. You can also add a config file with -c that contains
process { errorStrategy = 'ignore' }, further maximizing output.
Thank you for the helpful insights.
Attached here is the .nextflow.log file with the corresponding Busco errors.
I hope there is a straightforward fix; otherwise I'll try what you suggest to have the pipeline complete along with the taxonomic classifications.
Thanks for the help. GL
Unfortunately I cannot pinpoint the problem. Maybe @skrakau can find the cause here (but she is currently unavailable, keeping this here as reference fore later).
Your best shot for now might be
- update your nextflow version, I have no high hopes but it might work
- try my suggestion (as above mentioned):
You can also add a config file with -c that contains process { errorStrategy = 'ignore' }, further maximizing output.
Ignoring BUSCO_PLOT's failing should not have any consequences besides having no multiQC report. When you follow my complete advice above, you should have all output (incl. multiQC report) except that there is no BUSCO output in multiQC.
Sorry, I also can not pinpoint the problem here. We would probably need some minimal set of data for which the problem occurs to check if we can reproduce this.
If the error reoccurs and remains a problem for you let us know.
The BUSCO_PLOT process was removed in nf-core/mag version 2.3.0, as this anyway was undocumented and the output figures were not particular useful for metagenomic data. So this issue does not accord anymore.