mag
mag copied to clipboard
Sharing an issue when I use nf-core/mag pipeline
Description of the bug
Caused by Process `NFCORE_MAG:BINNING:MAG_DEPTHS_PLOT (MEGAHIT-MetaBAT2-G3062_unclassified)` terminated with an error exit status (1)
Command executed: plot_mag_depths.py --bin_depths MEGAHIT-MetaBAT2-G3062_unclassified-binDepths.tsv --groups sample_groups.tsv --out "MEGAHIT-MetaBAT2-G3062_unclassified-binDepths.heatmap.png"
Error executing process > 'NFCORE_MAG:MAG:BINNING:MAG_DEPTHS_PLOT (MEGAHIT-MetaBAT2-G3062_unclassified)'
Caused by:
Process `NFCORE_MAG:MAG:BINNING:MAG_DEPTHS_PLOT (MEGAHIT-MetaBAT2-G3062_unclassified)` terminated with an error exit status (1)
Command executed:
plot_mag_depths.py --bin_depths MEGAHIT-MetaBAT2-G3062_unclassified-binDepths.tsv --groups sample_groups.tsv --out "MEGAHIT-MetaBAT2-G3062_unclassified-binDepths.heatmap.png"
cat <<-END_VERSIONS > versions.yml
"NFCORE_MAG:MAG:BINNING:MAG_DEPTHS_PLOT":
python: $(python --version 2>&1 | sed 's/Python //g')
pandas: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
seaborn: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('seaborn').version)")
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
Traceback (most recent call last):
File "/home/uhlemann/.nextflow/assets/nf-core/mag/bin/plot_mag_depths.py", line 50, in <module>
sys.exit(main())
File "/home/uhlemann/.nextflow/assets/nf-core/mag/bin/plot_mag_depths.py", line 45, in main
sns.clustermap(df, row_cluster=True, yticklabels=bin_labels, cmap="vlag", center=0, col_colors=groups.group.map(color_map), figsize=(6,6))
File "/usr/local/lib/python3.9/site-packages/seaborn/_decorators.py", line 46, in inner_f
return f(**kwargs)
File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 1402, in clustermap
return plotter.plot(metric=metric, method=method,
File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 1220, in plot
self.plot_dendrograms(row_cluster, col_cluster, metric, method,
File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 1065, in plot_dendrograms
self.dendrogram_row = dendrogram(
File "/usr/local/lib/python3.9/site-packages/seaborn/_decorators.py", line 46, in inner_f
return f(**kwargs)
File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 784, in dendrogram
plotter = _DendrogramPlotter(data, linkage=linkage, axis=axis,
File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 594, in __init__
self.linkage = self.calculated_linkage
File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 661, in calculated_linkage
return self._calculate_linkage_scipy()
File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 629, in _calculate_linkage_scipy
linkage = hierarchy.linkage(self.array, method=self.method,
File "/usr/local/lib/python3.9/site-packages/scipy/cluster/hierarchy.py", line 1068, in linkage
n = int(distance.num_obs_y(y))
File "/usr/local/lib/python3.9/site-packages/scipy/spatial/distance.py", line 2572, in num_obs_y
raise ValueError("The number of observations cannot be determined on "
ValueError: The number of observations cannot be determined on an empty distance matrix.
Work dir:
/media/uhlemann/core4/01_Uhlemann_fastq/190517_CCM_MG/QC/work/c8/e2ea2e22dc8fdecce004a3a942d88f
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
Join mismatch for the following entries:
- key=MEGAHIT-MaxBin2-K2552_unclassified.009.fa values=
- key=MEGAHIT-MaxBin2-K2136_unclassified.003.fa values=
- key=MEGAHIT-MaxBin2-G3062_unclassified.001.fa values=
- key=MEGAHIT-MaxBin2-K2328_unclassified.009.fa values=
- key=MEGAHIT-MaxBin2-K2328_unclassified.002.fa values=
- key=MEGAHIT-MaxBin2-K2328_unclassified.005.fa values=
- key=MEGAHIT-MetaBAT2-K2136_unclassified.2.fa values=
- key=MEGAHIT-MetaBAT2-K2130_unclassified.2.fa values=
- key=MEGAHIT-MaxBin2-K2552_unclassified.001.fa values=
- key=MEGAHIT-MaxBin2-K2136_unclassified.006.fa values=
(more omitted)
Command used and terminal output
nextflow run nf-core/mag -profile singularity --input '*_{1,2}.fastq.gz' --busco_reference '/media/uhlemann/core4/DB/bacteria_odb10.2020-03-06.tar.gz' --outdir old_MAGs -resume --skip_spades --skip_spadeshybrid
Relevant files
No response
System information
nextflow version 22.04.5.5708 Desktop Singularity Linux ubuntu 21 nf-core/mag v2.2.0
Note: I've updated the title and added code block formatting
Thanks for formatting it. Unfortunately that is not a very informative error message.
@bbagy Could you try:
- resume the pipeline, maybe it was a temporal system hickup (start the pipeline in the same directory with the same command but append
-resume) - if the above doesnt work (small chance, but still...), could you share the file
old_MAGs/GenomeBinning/depths/bins/MEGAHIT-MetaBAT2-G3062_unclassified-binDepths.tsv(identical to/media/uhlemann/core4/01_Uhlemann_fastq/190517_CCM_MG/QC/work/c8/e2ea2e22dc8fdecce004a3a942d88f/MEGAHIT-MetaBAT2-G3062_unclassified-binDepths.tsv) and/media/uhlemann/core4/01_Uhlemann_fastq/190517_CCM_MG/QC/work/c8/e2ea2e22dc8fdecce004a3a942d88f/sample_groups.tsv?
Hi Daniel,
I am not sure it is helpful information, but the pipeline was ok when I use raw fastqs. I got this error, when I use fastqs that was filtered human reads myself. Because I showed a lot of human read contigs when I use the raw fastqs. Also, it produced several TB of work files, which I can’t cover on my desktop.
Anyway I am going to try it again and let you know how looks like.
Best, Heekuk
On Sep 16, 2022, at 4:11 AM, Daniel Straub @.***> wrote:
GenomeBinning/depths/bins/
I am not sure it is helpful information, but the pipeline was ok when I use raw fastqs. I got this error, when I use fastqs that was filtered human reads myself.
If the pipeline progresses until Process NFCORE_MAG:BINNING:MAG_DEPTHS_PLOT than this means it did assemble the data. And that means your fastas should have been fine. To me, that speaks again for a -resume.
Also, it produced several TB of work files, which I can’t cover on my desktop.
I asked you for only two files, not all work files. Anyway, lets hope the resume works.
Anyway I am going to try it again and let you know how looks like.
Great, but please test -resume, do not start the complete pipeline again ;)
Hi Daniel,
I just wonder if you find the any reason for the errors that I have. My sequences only very have a few reads after removing human reads. Do you think it is possible a reason that is not enough reads for assembly or binning?
I really appreciate for your help and consideration.
Best, Heekuk
On Sep 16, 2022, at 11:02 AM, Heekuk Park @.***> wrote:
Hi Daniel,
Thank you for helping this.
I notice that I used ignore code as below, so it was able to ignore this part.
process { withName: MAG_DEPTHS_PLOT { errorStrategy = 'ignore' } }
So the attached files were from the run without the ignoring code. The attachments is not the same you asked me, but I hope it is the file that stropped with the same error.
Also, I put error message as well, there is an error (I say error1) "NFCORE_MAG:MAG:GTDBTK:GTDBTK_DB_PREPARATION” and stoped, when I did -resume, this error was not come up. But I don’t know whether it was solved itself.
So then I could see the current issue. Please look at it and let me know if you have any comments.
Best, Heekuk
<nf-core_errors.zip>
ps: I wonder how I can reduce the work dir. it is growing so big.
On Sep 16, 2022, at 9:42 AM, Daniel Straub @.*** @.***>> wrote:
I am not sure it is helpful information, but the pipeline was ok when I use raw fastqs. I got this error, when I use fastqs that was filtered human reads myself.
If the pipeline progresses until Process NFCORE_MAG:BINNING:MAG_DEPTHS_PLOT than this means it did assemble the data. And that means your fastas should have been fine. To me, that speaks again for a -resume.
Also, it produced several TB of work files, which I can’t cover on my desktop.
I asked you for only two files, not all work files. Anyway, lets hope the resume works.
Anyway I am going to try it again and let you know how looks like.
Great, but please test -resume, do not start the complete pipeline again ;)
— Reply to this email directly, view it on GitHub https://github.com/nf-core/mag/issues/337#issuecomment-1249381748, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARMACO57KYVMGIKIW7CEXRDV6R2MLANCNFSM6AAAAAAQMXG67M. You are receiving this because you were mentioned.
Well, it is very well possible that with too less reads there is no proper genome bin formed because of too less data. Could you have a look in results/GenomeBinning/bin_summary.tsv? Let me know what you find in that file.
Hi Daniel,
Thank you for helping this.
I notice that I used ignore code as below, so it was able to ignore this part.
process { withName: MAG_DEPTHS_PLOT { errorStrategy = 'ignore' } }
So the attached files were from the run without the ignoring code. The attachments is not the same you asked me, but I hope it is the file that stropped with the same error.
Also, I put error message as well, there is an error (I say error1) "NFCORE_MAG:MAG:GTDBTK:GTDBTK_DB_PREPARATION” and stoped, when I did -resume, this error was not come up. But I don’t know whether it was solved itself.
So then I could see the current issue. Please look at it and let me know if you have any comments.
Best, Heekuk
ps: I wonder how I can reduce the work dir. it is growing so big.
On Sep 16, 2022, at 9:42 AM, Daniel Straub @.***> wrote:
I am not sure it is helpful information, but the pipeline was ok when I use raw fastqs. I got this error, when I use fastqs that was filtered human reads myself.
If the pipeline progresses until Process NFCORE_MAG:BINNING:MAG_DEPTHS_PLOT than this means it did assemble the data. And that means your fastas should have been fine. To me, that speaks again for a -resume.
Also, it produced several TB of work files, which I can’t cover on my desktop.
I asked you for only two files, not all work files. Anyway, lets hope the resume works.
Anyway I am going to try it again and let you know how looks like.
Great, but please test -resume, do not start the complete pipeline again ;)
— Reply to this email directly, view it on GitHub https://github.com/nf-core/mag/issues/337#issuecomment-1249381748, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARMACO57KYVMGIKIW7CEXRDV6R2MLANCNFSM6AAAAAAQMXG67M. You are receiving this because you were mentioned.
The ValueError: The number of observations cannot be determined on an empty distance matrix. from the MAG_DEPTHS_PLOT process occurs when there is only one bin in the *-binDepths.tsv input file. See https://github.com/nf-core/mag/issues/383.
I close this in favor of https://github.com/nf-core/mag/issues/383