scrublet
scrublet copied to clipboard
scrub_doublets still completes without annoy, nearest neighbor search
First of all, thank for scrublet! I have been using for a while now, and much prefer it over the alternatives.
As for the issue, it's a bit niche but can potentially cause serious silent issues on an HPC. Even if annoy
is installed, loading can fail if a semi-recent version of gcc is not currently in the user's path. For HPC users, this would generally require loading a GCC module. In my case, module load gcc/11.2.0
restores the missing library and solves the issue.
Minimal code to reproduce the issue:
import scrublet as scr
mport scipy.io
import os
import gzip
import pandas as pd
counts_matrix = scipy.io.mmread(gzip.open("path/matrix.mtx.gz")).T.tocsc()
scrub = scr.Scrublet(counts_matrix, expected_doublet_rate = 0.1)
doublet_scores, doublets = scrub.scrub_doublets(min_counts=2, min_cells=3, min_gene_variability_pctl=85, n_prin_comps=30)
And the behavior:
Preprocessing...
Simulating doublets...
Embedding transcriptomes using PCA...
Calculating doublet scores...
Could not find library "annoy" for approx. nearest neighbor search
Automatically set threshold at doublet score = 0.62
Detected doublet rate = 0.4%
Estimated detectable doublet fraction = 4.5%
Overall doublet rate:
Expected = 10.0%
Estimated = 8.1%
Elapsed time: 10.9 seconds
In this case, doublet rate is still estimated, but apparently without finding nearest neighbors for simulated doublets. Or perhaps another method is used? Still, would be worth throwing a stronger warning of some sort or even failing in this case. If this analysis is automated, these sorts of messages may be missed entirely.
And for clarity, this is what happens if I import annoy
without a gcc module loaded, even though annoy
is installed in my virtual environment:
import annoy
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "path/venv/lib/python3.9/site-packages/annoy/__init__.py", line 16, in <module>
from .annoylib import Annoy as AnnoyIndex
ImportError: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by path/venv/lib/python3.9/site-packages/annoy/annoylib.cpython-39-x86_64-linux-gnu.so)
You can run scrublet in your own conda environment and reference that environment's lib path rather than the HPC's by (replace w/ appropriate dir): export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/miniconda3/lib