muon
muon copied to clipboard
error in function ac.pl.fragment_histogram(atac, region=''")
Hello, thanks for muon!
I was trying to reproduce the steps described in the ATAC tutorial (https://muon-tutorials.readthedocs.io/en/latest/single-cell-rna-atac/pbmc10k/2-Chromatin-Accessibility-Processing.html) on my independent dataset, but I get an error from the ac.pl.fragment_histogram function.
The code:
import numpy as np
import pandas as pd
import scanpy as sc
import anndata as ad
import scipy
import anndata2ri
## Plotting utils
import matplotlib.pyplot as plt
import matplotlib
sc.settings.verbosity = 3 # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.settings.set_figure_params(dpi=150, color_map='viridis', transparent=True,
ipython_format="png2x")
sc.logging.print_header()
import muon as mu
# Import a module with ATAC-seq-related functions
from muon import atac as ac
atac = mu.read(path + "output/file_postrnatutorial.h5mu")
atac.var['mt'] = atac.var_names.str.startswith('MT-') # annotate the group of mitochondrial genes as 'mt'
sc.pp.calculate_qc_metrics(atac, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
mu.pp.filter_var(atac, 'n_cells_by_counts', lambda x: x >= 10)
mu.pp.filter_obs(atac, 'n_genes_by_counts', lambda x: (x >= 2000) & (x <= 40000))
mu.pp.filter_obs(atac, 'total_counts', lambda x: (x >= 4000) & (x <= 100000))
atac.obs['NS']=1
ac.pl.fragment_histogram(atac, region='chr1:9808-10702')
The error is:
ValueError Traceback (most recent call last)
<ipython-input-16-0ef9424cd717> in <module>
----> 1 ac.pl.fragment_histogram(K29TIL_UD_atac, region='chr1:9808-10702')
/usr/local/lib/python3.6/dist-packages/muon/_atac/plot.py in fragment_histogram(data, region, groupby, barcodes)
372 else:
373 # Handle sns.distplot deprecation and sns.histplot addition
--> 374 g = hist(fragments.length, **kwargs)
375 g.set_xlabel("Fragment length (bp)")
376 g.set(xlim=(0, 1000))
/usr/local/lib/python3.6/dist-packages/seaborn/distributions.py in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs)
1434 estimate_kws=estimate_kws,
1435 line_kws=line_kws,
-> 1436 **kwargs,
1437 )
1438
/usr/local/lib/python3.6/dist-packages/seaborn/distributions.py in plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws)
435
436 # Do the histogram computation
--> 437 heights, edges = estimator(observations, weights=weights)
438
439 # Rescale the smoothed curve to match the histogram
/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in __call__(self, x1, x2, weights)
369 """Count the occurrances in each bin, maybe normalize."""
370 if x2 is None:
--> 371 return self._eval_univariate(x1, weights)
372 else:
373 return self._eval_bivariate(x1, x2, weights)
/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in _eval_univariate(self, x, weights)
346 bin_edges = self.bin_edges
347 if bin_edges is None:
--> 348 bin_edges = self.define_bin_edges(x, weights=weights, cache=False)
349
350 density = self.stat == "density"
/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in define_bin_edges(self, x1, x2, weights, cache)
264
265 bin_edges = self._define_bin_edges(
--> 266 x1, weights, self.bins, self.binwidth, self.binrange, self.discrete,
267 )
268
/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in _define_bin_edges(self, x, weights, bins, binwidth, binrange, discrete)
252 elif binwidth is not None:
253 step = binwidth
--> 254 bin_edges = np.arange(start, stop + step, step)
255 else:
256 bin_edges = np.histogram_bin_edges(
ValueError: arange: cannot compute length
The correct fragment file is stored in atac.uns['files'] and I've installed the relevant dependencies with pip install 'muon[atac]'. Any idea what the issue might be?
System
- OS: [macOS Catalina]
- Python version [3.6.9]
- Versions of libraries involved [scanpy==1.7.2 anndata==0.7.5 umap==0.4.6 numpy==1.19.5 scipy==1.5.4 pandas==1.1.5 scikit-learn==0.24.2 statsmodels==0.12.1 python-igraph==0.8.2 louvain==0.7.0 leidenalg==0.8.2]
Thanks! Rasa
Hey @Raselel, thanks for reporting that.
- And what the version of seaborn used here? Seems like it can't generate a histogram for the fragments in this region.
- Are there any fragments at all in the region
chr1:9808-10702
?
Thanks for a quick reply.
- seaborn version '0.11.1'
- yes, I picked it up from the atac.var.
chr1:9808-10702
has total_counts 180.0
Thanks! Seaborn 0.11.1 seems to work for the PMBC10k and a few other datasets.
I also tested the versions mentioned above in a clean environment (and pysam=0.18.0
), and the following still seems to work:
import muon as mu
from muon import atac as ac
atac = mu.read("brain3k_processed.h5mu/atac")
ac.pl.fragment_histogram(atac, region='chr1:9808-10702')
- Is it a problem with a particular region? I.e. would it work for a larger region or other regions in general?
- Are other fragments-related parts functional, e.g. nucleosome signal or TSS enrichment in that notebook?
Also maybe clarifying a few things about the code above will help to understand the issue better.
Is atac
above actually a MuData or an AnnData object? While the processing makes sense for an AnnData object, the reading function probably spits out a MuData?
mu.read(path + "output/file_postrnatutorial.h5mu")
And then, if it's an AnnData with e.g. peak counts and chr1:9808-10702
is in .var
, there are also probably no var_names
in it starting with MT-
?
atac.var['mt'] = atac.var_names.str.startswith('MT-')
- I have tried a few regions, but none plot.
- The nucleosome part plotting looks odd too.
I think you are right about the object: the atac.var does not have genes in var_names
This is the structure of the object, if helps.
Should I read the file differently?
Thanks!
Hey @Raselel, to get back on this, I wasn't able to reproduce it on several scATAC-seq / multiome datasets. Was this issue resolved in any way? If not, should we figure out a way to share a portion of this dataset with us so that we can debug it?
I had this same bug when I had an ATAC anndata object where I removed the "-1" from the end of the barcodes within adata.obs.index (i.e. barcode name AAACAGCCAACTAACT-1 changed to AAACAGCCAACTAACT). Adding back in the "-1" to the names in adata.obs.index fixed the problem.
Thanks, @connersk, we'll try to reproduce that and fix it!
Indeed, the reason seems to have been that the cell identifiers were renamed. The fragments file contains cell identifiers as well, and together with the ones in the in-memory object, those are used to link fragment -> cell information.
The solution for this is already implemented however: functions such as ac.pl.fragment_histogram
and ac.tl.nucleosome_signal
have the barcodes
parameter. Generally, it makes sense to keep the original barcode in the metadata, and with barcodes="orig_barcode"
it can be used for fragments-related functionality.
This will be available for all the functions operating on fragments in the next versions.