mag icon indicating copy to clipboard operation
mag copied to clipboard

Sharing an issue when I use nf-core/mag pipeline

Open bbagy opened this issue 3 years ago • 7 comments

Description of the bug

  Caused by Process `NFCORE_MAG:BINNING:MAG_DEPTHS_PLOT (MEGAHIT-MetaBAT2-G3062_unclassified)` terminated with an error exit status (1)

Command executed:  plot_mag_depths.py --bin_depths MEGAHIT-MetaBAT2-G3062_unclassified-binDepths.tsv                     --groups sample_groups.tsv                     --out "MEGAHIT-MetaBAT2-G3062_unclassified-binDepths.heatmap.png" 


Error executing process > 'NFCORE_MAG:MAG:BINNING:MAG_DEPTHS_PLOT (MEGAHIT-MetaBAT2-G3062_unclassified)'

Caused by:
  Process `NFCORE_MAG:MAG:BINNING:MAG_DEPTHS_PLOT (MEGAHIT-MetaBAT2-G3062_unclassified)` terminated with an error exit status (1)

Command executed:

  plot_mag_depths.py --bin_depths MEGAHIT-MetaBAT2-G3062_unclassified-binDepths.tsv                     --groups sample_groups.tsv                     --out "MEGAHIT-MetaBAT2-G3062_unclassified-binDepths.heatmap.png"
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_MAG:MAG:BINNING:MAG_DEPTHS_PLOT":
      python: $(python --version 2>&1 | sed 's/Python //g')
      pandas: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
      seaborn: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('seaborn').version)")
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/home/uhlemann/.nextflow/assets/nf-core/mag/bin/plot_mag_depths.py", line 50, in <module>
      sys.exit(main())
    File "/home/uhlemann/.nextflow/assets/nf-core/mag/bin/plot_mag_depths.py", line 45, in main
      sns.clustermap(df, row_cluster=True, yticklabels=bin_labels, cmap="vlag", center=0, col_colors=groups.group.map(color_map), figsize=(6,6))
    File "/usr/local/lib/python3.9/site-packages/seaborn/_decorators.py", line 46, in inner_f
      return f(**kwargs)
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 1402, in clustermap
      return plotter.plot(metric=metric, method=method,
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 1220, in plot
      self.plot_dendrograms(row_cluster, col_cluster, metric, method,
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 1065, in plot_dendrograms
      self.dendrogram_row = dendrogram(
    File "/usr/local/lib/python3.9/site-packages/seaborn/_decorators.py", line 46, in inner_f
      return f(**kwargs)
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 784, in dendrogram
      plotter = _DendrogramPlotter(data, linkage=linkage, axis=axis,
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 594, in __init__
      self.linkage = self.calculated_linkage
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 661, in calculated_linkage
      return self._calculate_linkage_scipy()
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 629, in _calculate_linkage_scipy
      linkage = hierarchy.linkage(self.array, method=self.method,
    File "/usr/local/lib/python3.9/site-packages/scipy/cluster/hierarchy.py", line 1068, in linkage
      n = int(distance.num_obs_y(y))
    File "/usr/local/lib/python3.9/site-packages/scipy/spatial/distance.py", line 2572, in num_obs_y
      raise ValueError("The number of observations cannot be determined on "
  ValueError: The number of observations cannot be determined on an empty distance matrix.

Work dir:
  /media/uhlemann/core4/01_Uhlemann_fastq/190517_CCM_MG/QC/work/c8/e2ea2e22dc8fdecce004a3a942d88f

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`


Join mismatch for the following entries: 
- key=MEGAHIT-MaxBin2-K2552_unclassified.009.fa values= 
- key=MEGAHIT-MaxBin2-K2136_unclassified.003.fa values= 
- key=MEGAHIT-MaxBin2-G3062_unclassified.001.fa values= 
- key=MEGAHIT-MaxBin2-K2328_unclassified.009.fa values= 
- key=MEGAHIT-MaxBin2-K2328_unclassified.002.fa values= 
- key=MEGAHIT-MaxBin2-K2328_unclassified.005.fa values= 
- key=MEGAHIT-MetaBAT2-K2136_unclassified.2.fa values= 
- key=MEGAHIT-MetaBAT2-K2130_unclassified.2.fa values= 
- key=MEGAHIT-MaxBin2-K2552_unclassified.001.fa values= 
- key=MEGAHIT-MaxBin2-K2136_unclassified.006.fa values=
(more omitted)

Command used and terminal output

nextflow run nf-core/mag -profile singularity --input '*_{1,2}.fastq.gz' --busco_reference '/media/uhlemann/core4/DB/bacteria_odb10.2020-03-06.tar.gz' --outdir old_MAGs -resume --skip_spades --skip_spadeshybrid

Relevant files

No response

System information

nextflow version 22.04.5.5708 Desktop Singularity Linux ubuntu 21 nf-core/mag v2.2.0

bbagy avatar Sep 14 '22 19:09 bbagy

Note: I've updated the title and added code block formatting

jfy133 avatar Sep 15 '22 07:09 jfy133

Thanks for formatting it. Unfortunately that is not a very informative error message.

@bbagy Could you try:

  • resume the pipeline, maybe it was a temporal system hickup (start the pipeline in the same directory with the same command but append -resume)
  • if the above doesnt work (small chance, but still...), could you share the file old_MAGs/GenomeBinning/depths/bins/MEGAHIT-MetaBAT2-G3062_unclassified-binDepths.tsv (identical to /media/uhlemann/core4/01_Uhlemann_fastq/190517_CCM_MG/QC/work/c8/e2ea2e22dc8fdecce004a3a942d88f/MEGAHIT-MetaBAT2-G3062_unclassified-binDepths.tsv) and /media/uhlemann/core4/01_Uhlemann_fastq/190517_CCM_MG/QC/work/c8/e2ea2e22dc8fdecce004a3a942d88f/sample_groups.tsv?

d4straub avatar Sep 16 '22 08:09 d4straub

Hi Daniel,

I am not sure it is helpful information, but the pipeline was ok when I use raw fastqs. I got this error, when I use fastqs that was filtered human reads myself. Because I showed a lot of human read contigs when I use the raw fastqs. Also, it produced several TB of work files, which I can’t cover on my desktop.

Anyway I am going to try it again and let you know how looks like.

Best, Heekuk

On Sep 16, 2022, at 4:11 AM, Daniel Straub @.***> wrote:

GenomeBinning/depths/bins/

bbagy avatar Sep 16 '22 13:09 bbagy

I am not sure it is helpful information, but the pipeline was ok when I use raw fastqs. I got this error, when I use fastqs that was filtered human reads myself.

If the pipeline progresses until Process NFCORE_MAG:BINNING:MAG_DEPTHS_PLOT than this means it did assemble the data. And that means your fastas should have been fine. To me, that speaks again for a -resume.

Also, it produced several TB of work files, which I can’t cover on my desktop.

I asked you for only two files, not all work files. Anyway, lets hope the resume works.

Anyway I am going to try it again and let you know how looks like.

Great, but please test -resume, do not start the complete pipeline again ;)

d4straub avatar Sep 16 '22 13:09 d4straub

Hi Daniel,

I just wonder if you find the any reason for the errors that I have. My sequences only very have a few reads after removing human reads. Do you think it is possible a reason that is not enough reads for assembly or binning?

I really appreciate for your help and consideration.

Best, Heekuk

On Sep 16, 2022, at 11:02 AM, Heekuk Park @.***> wrote:

Hi Daniel,

Thank you for helping this.

I notice that I used ignore code as below, so it was able to ignore this part.

process { withName: MAG_DEPTHS_PLOT { errorStrategy = 'ignore' } }

So the attached files were from the run without the ignoring code. The attachments is not the same you asked me, but I hope it is the file that stropped with the same error.

Also, I put error message as well, there is an error (I say error1) "NFCORE_MAG:MAG:GTDBTK:GTDBTK_DB_PREPARATION” and stoped, when I did -resume, this error was not come up. But I don’t know whether it was solved itself.

So then I could see the current issue. Please look at it and let me know if you have any comments.

Best, Heekuk

<nf-core_errors.zip>

ps: I wonder how I can reduce the work dir. it is growing so big.

On Sep 16, 2022, at 9:42 AM, Daniel Straub @.*** @.***>> wrote:

I am not sure it is helpful information, but the pipeline was ok when I use raw fastqs. I got this error, when I use fastqs that was filtered human reads myself.

If the pipeline progresses until Process NFCORE_MAG:BINNING:MAG_DEPTHS_PLOT than this means it did assemble the data. And that means your fastas should have been fine. To me, that speaks again for a -resume.

Also, it produced several TB of work files, which I can’t cover on my desktop.

I asked you for only two files, not all work files. Anyway, lets hope the resume works.

Anyway I am going to try it again and let you know how looks like.

Great, but please test -resume, do not start the complete pipeline again ;)

— Reply to this email directly, view it on GitHub https://github.com/nf-core/mag/issues/337#issuecomment-1249381748, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARMACO57KYVMGIKIW7CEXRDV6R2MLANCNFSM6AAAAAAQMXG67M. You are receiving this because you were mentioned.

bbagy avatar Sep 28 '22 17:09 bbagy

Well, it is very well possible that with too less reads there is no proper genome bin formed because of too less data. Could you have a look in results/GenomeBinning/bin_summary.tsv? Let me know what you find in that file.

d4straub avatar Sep 29 '22 06:09 d4straub

Hi Daniel,

Thank you for helping this.

I notice that I used ignore code as below, so it was able to ignore this part.

process { withName: MAG_DEPTHS_PLOT { errorStrategy = 'ignore' } }

So the attached files were from the run without the ignoring code. The attachments is not the same you asked me, but I hope it is the file that stropped with the same error.

Also, I put error message as well, there is an error (I say error1) "NFCORE_MAG:MAG:GTDBTK:GTDBTK_DB_PREPARATION” and stoped, when I did -resume, this error was not come up. But I don’t know whether it was solved itself.

So then I could see the current issue. Please look at it and let me know if you have any comments.

Best, Heekuk

ps: I wonder how I can reduce the work dir. it is growing so big.

On Sep 16, 2022, at 9:42 AM, Daniel Straub @.***> wrote:

I am not sure it is helpful information, but the pipeline was ok when I use raw fastqs. I got this error, when I use fastqs that was filtered human reads myself.

If the pipeline progresses until Process NFCORE_MAG:BINNING:MAG_DEPTHS_PLOT than this means it did assemble the data. And that means your fastas should have been fine. To me, that speaks again for a -resume.

Also, it produced several TB of work files, which I can’t cover on my desktop.

I asked you for only two files, not all work files. Anyway, lets hope the resume works.

Anyway I am going to try it again and let you know how looks like.

Great, but please test -resume, do not start the complete pipeline again ;)

— Reply to this email directly, view it on GitHub https://github.com/nf-core/mag/issues/337#issuecomment-1249381748, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARMACO57KYVMGIKIW7CEXRDV6R2MLANCNFSM6AAAAAAQMXG67M. You are receiving this because you were mentioned.

bbagy avatar Oct 11 '22 07:10 bbagy

The ValueError: The number of observations cannot be determined on an empty distance matrix. from the MAG_DEPTHS_PLOT process occurs when there is only one bin in the *-binDepths.tsv input file. See https://github.com/nf-core/mag/issues/383.

skrakau avatar Mar 02 '23 09:03 skrakau

I close this in favor of https://github.com/nf-core/mag/issues/383

d4straub avatar May 09 '23 11:05 d4straub