muon icon indicating copy to clipboard operation
muon copied to clipboard

error in function ac.pl.fragment_histogram(atac, region=''")

Open Raselel opened this issue 3 years ago • 5 comments

Hello, thanks for muon!

I was trying to reproduce the steps described in the ATAC tutorial (https://muon-tutorials.readthedocs.io/en/latest/single-cell-rna-atac/pbmc10k/2-Chromatin-Accessibility-Processing.html) on my independent dataset, but I get an error from the ac.pl.fragment_histogram function.

The code:

import numpy as np
import pandas as pd
import scanpy as sc
import anndata as ad
import scipy
import anndata2ri

## Plotting utils
import matplotlib.pyplot as plt
import matplotlib
sc.settings.verbosity = 3  # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.settings.set_figure_params(dpi=150, color_map='viridis', transparent=True,
                              ipython_format="png2x")
sc.logging.print_header()

import muon as mu
# Import a module with ATAC-seq-related functions
from muon import atac as ac

atac = mu.read(path + "output/file_postrnatutorial.h5mu")

atac.var['mt'] = atac.var_names.str.startswith('MT-')  # annotate the group of mitochondrial genes as 'mt'

sc.pp.calculate_qc_metrics(atac, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
mu.pp.filter_var(atac, 'n_cells_by_counts', lambda x: x >= 10)
mu.pp.filter_obs(atac, 'n_genes_by_counts', lambda x: (x >= 2000) & (x <= 40000))
mu.pp.filter_obs(atac, 'total_counts', lambda x: (x >= 4000) & (x <= 100000))
atac.obs['NS']=1
ac.pl.fragment_histogram(atac, region='chr1:9808-10702')

The error is:

ValueError                                Traceback (most recent call last)
<ipython-input-16-0ef9424cd717> in <module>
----> 1 ac.pl.fragment_histogram(K29TIL_UD_atac, region='chr1:9808-10702')

/usr/local/lib/python3.6/dist-packages/muon/_atac/plot.py in fragment_histogram(data, region, groupby, barcodes)
    372     else:
    373         # Handle sns.distplot deprecation and sns.histplot addition
--> 374         g = hist(fragments.length, **kwargs)
    375         g.set_xlabel("Fragment length (bp)")
    376     g.set(xlim=(0, 1000))

/usr/local/lib/python3.6/dist-packages/seaborn/distributions.py in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs)
   1434             estimate_kws=estimate_kws,
   1435             line_kws=line_kws,
-> 1436             **kwargs,
   1437         )
   1438 

/usr/local/lib/python3.6/dist-packages/seaborn/distributions.py in plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws)
    435 
    436             # Do the histogram computation
--> 437             heights, edges = estimator(observations, weights=weights)
    438 
    439             # Rescale the smoothed curve to match the histogram

/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in __call__(self, x1, x2, weights)
    369         """Count the occurrances in each bin, maybe normalize."""
    370         if x2 is None:
--> 371             return self._eval_univariate(x1, weights)
    372         else:
    373             return self._eval_bivariate(x1, x2, weights)

/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in _eval_univariate(self, x, weights)
    346         bin_edges = self.bin_edges
    347         if bin_edges is None:
--> 348             bin_edges = self.define_bin_edges(x, weights=weights, cache=False)
    349 
    350         density = self.stat == "density"

/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in define_bin_edges(self, x1, x2, weights, cache)
    264 
    265             bin_edges = self._define_bin_edges(
--> 266                 x1, weights, self.bins, self.binwidth, self.binrange, self.discrete,
    267             )
    268 

/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in _define_bin_edges(self, x, weights, bins, binwidth, binrange, discrete)
    252         elif binwidth is not None:
    253             step = binwidth
--> 254             bin_edges = np.arange(start, stop + step, step)
    255         else:
    256             bin_edges = np.histogram_bin_edges(

ValueError: arange: cannot compute length

The correct fragment file is stored in atac.uns['files'] and I've installed the relevant dependencies with pip install 'muon[atac]'. Any idea what the issue might be?

System

  • OS: [macOS Catalina]
  • Python version [3.6.9]
  • Versions of libraries involved [scanpy==1.7.2 anndata==0.7.5 umap==0.4.6 numpy==1.19.5 scipy==1.5.4 pandas==1.1.5 scikit-learn==0.24.2 statsmodels==0.12.1 python-igraph==0.8.2 louvain==0.7.0 leidenalg==0.8.2]

Thanks! Rasa

Raselel avatar Feb 23 '22 17:02 Raselel

Hey @Raselel, thanks for reporting that.

  • And what the version of seaborn used here? Seems like it can't generate a histogram for the fragments in this region.
  • Are there any fragments at all in the region chr1:9808-10702?

gtca avatar Feb 23 '22 17:02 gtca

Thanks for a quick reply.

  • seaborn version '0.11.1'
  • yes, I picked it up from the atac.var. chr1:9808-10702 has total_counts 180.0

Raselel avatar Feb 23 '22 18:02 Raselel

Thanks! Seaborn 0.11.1 seems to work for the PMBC10k and a few other datasets. I also tested the versions mentioned above in a clean environment (and pysam=0.18.0), and the following still seems to work:

import muon as mu
from muon import atac as ac

atac = mu.read("brain3k_processed.h5mu/atac")
ac.pl.fragment_histogram(atac, region='chr1:9808-10702')
  1. Is it a problem with a particular region? I.e. would it work for a larger region or other regions in general?
  2. Are other fragments-related parts functional, e.g. nucleosome signal or TSS enrichment in that notebook?

Also maybe clarifying a few things about the code above will help to understand the issue better.

Is atac above actually a MuData or an AnnData object? While the processing makes sense for an AnnData object, the reading function probably spits out a MuData?

mu.read(path + "output/file_postrnatutorial.h5mu")

And then, if it's an AnnData with e.g. peak counts and chr1:9808-10702 is in .var, there are also probably no var_names in it starting with MT-?

atac.var['mt'] = atac.var_names.str.startswith('MT-')

gtca avatar Feb 23 '22 19:02 gtca

  1. I have tried a few regions, but none plot.
  2. The nucleosome part plotting looks odd too. Screen Shot 2022-02-23 at 7 24 50 PM

I think you are right about the object: the atac.var does not have genes in var_names Screen Shot 2022-02-23 at 7 19 21 PM

This is the structure of the object, if helps. Screen Shot 2022-02-23 at 7 19 07 PM

Should I read the file differently?

Thanks!

Raselel avatar Feb 23 '22 19:02 Raselel

Hey @Raselel, to get back on this, I wasn't able to reproduce it on several scATAC-seq / multiome datasets. Was this issue resolved in any way? If not, should we figure out a way to share a portion of this dataset with us so that we can debug it?

gtca avatar Apr 26 '22 13:04 gtca

I had this same bug when I had an ATAC anndata object where I removed the "-1" from the end of the barcodes within adata.obs.index (i.e. barcode name AAACAGCCAACTAACT-1 changed to AAACAGCCAACTAACT). Adding back in the "-1" to the names in adata.obs.index fixed the problem.

connersk avatar Jan 18 '23 15:01 connersk

Thanks, @connersk, we'll try to reproduce that and fix it!

gtca avatar Jan 18 '23 15:01 gtca

Indeed, the reason seems to have been that the cell identifiers were renamed. The fragments file contains cell identifiers as well, and together with the ones in the in-memory object, those are used to link fragment -> cell information.

The solution for this is already implemented however: functions such as ac.pl.fragment_histogram and ac.tl.nucleosome_signal have the barcodes parameter. Generally, it makes sense to keep the original barcode in the metadata, and with barcodes="orig_barcode" it can be used for fragments-related functionality.

This will be available for all the functions operating on fragments in the next versions.

gtca avatar Jan 29 '23 20:01 gtca