scDRS
scDRS copied to clipboard
AssertionError when running quick test after installation
Dear scDRS devs,
Hi, I tried following the tutorial (https://martinjzhang.github.io/scDRS/index.html) and running the quick test after installing scDRS
in a conda env:
git clone https://github.com/martinjzhang/scDRS.git
cd scDRS
git checkout -b v102 v1.0.2
pip install -e .
python -m pytest tests/test_CLI.py -p no:warnings
But then I ran into this error:
FAILED tests/test_CLI.py::test_score_cell_cli - AssertionError: Inconsistent values: norm_score
When I checked the output against the expected results listed on the tutorial page, indeed only the values norm_score
column are not matched:
## my output
>>> print(df_res.iloc[:10])
raw_score norm_score mc_pval pval nlog10_pval zscore
index
N1.MAA000586.3_8_M.1.1-1-1 4.741197 4.445458 0.047619 0.001664 2.778874 2.935716
F10.D041911.3_8_M.1.1-1-1 4.739066 6.037902 0.047619 0.001664 2.778874 2.935716
A17_B002755_B007347_S17.mm10-plus-7-0 4.636626 4.697128 0.047619 0.001664 2.778874 2.935716
C22_B003856_S298_L004.mus-2-0-1 4.680566 5.186194 0.047619 0.001664 2.778874 2.935716
G12.B002765.3_38_F.1.1-1-1 4.640043 6.071957 0.047619 0.001664 2.778874 2.935716
H5.B003278.3_38_F.1.1-1-1 4.445744 -0.697608 0.714286 0.745424 0.127596 -0.660160
O14.MAA000570.3_8_M.1.1-1-1 4.455234 -1.192483 0.857143 0.868552 0.061204 -1.119574
J21.B000634.3_56_F.1.1-1-1 4.443364 -2.218681 1.000000 0.990017 0.004358 -2.326973
E5.B002765.3_38_F.1.1-1-1 4.487077 1.216147 0.142857 0.118136 0.927616 1.184354
K20_B000268_B009896_S260.mm10-plus-4-0 4.535480 -4.155231 1.000000 1.000000 -0.000000 -10.000000
I wonder what would be the reason causing this, and whether I should worry about this before I run scDRS on my real dataset. Did you guys update the ways you compute the norm_score
? Also, I see in the manual and also the Github page you have the scDRS v.1.0.3 updated but I can't find this branch in the repo, and the version printed out from python also said v.1.0.2 (with some syntax warnings):
>>> import numpy as np
>>> import pandas as pd
>>> import scanpy as sc
>>> import anndata as ad
>>> import scdrs
/home/jul307/software/scDRS/scdrs/method.py:401: SyntaxWarning: invalid escape sequence '\s'
"""Compute overdispersion score
/home/jul307/software/scDRS/scdrs/method.py:595: SyntaxWarning: invalid escape sequence '\S'
"""Compute p-value from empirical null
>>> scdrs.__version__
'1.0.2'
Could this be the reason causing the inconsistency?
Here are the packages installed in my conda env for your information:
# packages in environment at /cndd/junhao/anaconda3/envs/scDRS:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
anndata 0.10.6 pypi_0 pypi
array-api-compat 1.6 pypi_0 pypi
bzip2 1.0.8 hd590300_5 conda-forge
ca-certificates 2024.2.2 hbcca054_0 conda-forge
contourpy 1.2.1 pypi_0 pypi
cycler 0.12.1 pypi_0 pypi
fire 0.6.0 pypi_0 pypi
fonttools 4.50.0 pypi_0 pypi
h5py 3.10.0 pypi_0 pypi
iniconfig 2.0.0 pypi_0 pypi
joblib 1.3.2 pypi_0 pypi
kiwisolver 1.4.5 pypi_0 pypi
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
legacy-api-wrap 1.4 pypi_0 pypi
libexpat 2.6.2 h59595ed_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 13.2.0 h807b86a_5 conda-forge
libgomp 13.2.0 h807b86a_5 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libsqlite 3.45.2 h2797004_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
llvmlite 0.42.0 pypi_0 pypi
matplotlib 3.8.4 pypi_0 pypi
natsort 8.4.0 pypi_0 pypi
ncurses 6.4.20240210 h59595ed_0 conda-forge
networkx 3.2.1 pypi_0 pypi
numba 0.59.1 pypi_0 pypi
numpy 1.26.4 pypi_0 pypi
openssl 3.2.1 hd590300_1 conda-forge
packaging 24.0 pypi_0 pypi
pandas 2.2.1 pypi_0 pypi
patsy 0.5.6 pypi_0 pypi
pillow 10.3.0 pypi_0 pypi
pip 24.0 pyhd8ed1ab_0 conda-forge
pluggy 1.4.0 pypi_0 pypi
pynndescent 0.5.12 pypi_0 pypi
pyparsing 3.1.2 pypi_0 pypi
pytest 8.1.1 pypi_0 pypi
python 3.12.2 hab00c5b_0_cpython conda-forge
python-dateutil 2.9.0.post0 pypi_0 pypi
pytz 2024.1 pypi_0 pypi
readline 8.2 h8228510_1 conda-forge
scanpy 1.10.0 pypi_0 pypi
scdrs 1.0.2 pypi_0 pypi
scikit-learn 1.4.1.post1 pypi_0 pypi
scikit-misc 0.3.1 pypi_0 pypi
scipy 1.13.0 pypi_0 pypi
seaborn 0.13.2 pypi_0 pypi
session-info 1.0.0 pypi_0 pypi
setuptools 69.2.0 pyhd8ed1ab_0 conda-forge
six 1.16.0 pypi_0 pypi
statsmodels 0.14.1 pypi_0 pypi
stdlib-list 0.10.0 pypi_0 pypi
termcolor 2.4.0 pypi_0 pypi
threadpoolctl 3.4.0 pypi_0 pypi
tk 8.6.13 noxft_h4845f30_101 conda-forge
tqdm 4.66.2 pypi_0 pypi
tzdata 2024.1 pypi_0 pypi
umap-learn 0.5.6 pypi_0 pypi
wheel 0.43.0 pyhd8ed1ab_1 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
And the full error log when I ran the quick test:
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/jul307/software/scDRS
configfile: pyproject.toml
collected 3 items
tests/test_CLI.py F.. [100%]
================================================================================================================== FAILURES ==================================================================================================================
____________________________________________________________________________________________________________ test_score_cell_cli _____________________________________________________________________________________________________________
def test_score_cell_cli():
"""
Test CLI `scdrs compute-score`
"""
# Load toy data
ROOT_DIR = scdrs.__path__[0]
H5AD_FILE = os.path.join(ROOT_DIR, "data/toydata_mouse.h5ad")
COV_FILE = os.path.join(ROOT_DIR, "data/toydata_mouse.cov")
assert os.path.exists(H5AD_FILE), "built-in data toydata_mouse.h5ad missing"
assert os.path.exists(COV_FILE), "built-in data toydata_mouse.cov missing"
tmp_dir = tempfile.TemporaryDirectory()
tmp_dir_path = tmp_dir.name
dict_df_score = {}
for gs_species in ["human", "mouse"]:
gs_file = os.path.join(ROOT_DIR, f"data/toydata_{gs_species}.gs")
# call compute_score.py
cmds = [
f"scdrs compute-score",
f"--h5ad_file {H5AD_FILE}",
"--h5ad_species mouse",
f"--gs_file {gs_file}",
f"--gs_species {gs_species}",
f"--cov_file {COV_FILE}",
"--ctrl_match_opt mean_var",
"--n_ctrl 20",
"--flag_filter_data False",
"--weight_opt vs",
"--flag_raw_count False",
"--flag_return_ctrl_raw_score False",
"--flag_return_ctrl_norm_score False",
f"--out_folder {tmp_dir_path}",
]
subprocess.check_call(" ".join(cmds), shell=True)
dict_df_score[gs_species] = pd.read_csv(
os.path.join(tmp_dir_path, f"toydata_gs_{gs_species}.score.gz"),
sep="\t",
index_col=0,
)
# consistency between human and mouse
assert np.all(dict_df_score["mouse"].pval == dict_df_score["human"].pval)
df_res = dict_df_score["mouse"]
REF_COV_FILE = os.path.join(
ROOT_DIR, "data/toydata_gs_mouse.ref_Ctrl20_CovConstCovariate.score.gz"
)
df_ref_res = pd.read_csv(REF_COV_FILE, sep="\t", index_col=0)
> compare_score_file(df_res, df_ref_res)
tests/test_CLI.py:58:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
df_res = raw_score norm_score mc_pval pval nlog10_pval zscore
index ...00 -10.000000
J10_B003899_S130.mus-7-0-1 4.460493 -1.627243 1.000000 0.956739 0.019207 -1.714034
df_res_ref = raw_score norm_score mc_pval pval nlog10_pval zscore
index ...00 -10.000000
J10_B003899_S130.mus-7-0-1 4.460493 -2.305674 1.000000 0.991680 0.003628 -2.394591
def compare_score_file(df_res, df_res_ref):
"""
Compare df_res
"""
col_list = ["raw_score", "norm_score", "mc_pval", "pval"]
for col in col_list:
v_ = df_res[col].values
v_ref = df_res_ref[col].values
err_msg = "Inconsistent values: {}\n".format(col)
err_msg += "|{:^15}|{:^15}|{:^15}|{:^15}|\n".format(
"OBS", "REF", "DIF", "REL_DIF"
)
for i in range(v_.shape[0]):
err_msg += "|{:^15.3e}|{:^15.3e}|{:^15.3e}|{:^15.3e}|\n".format(
v_[i],
v_ref[i],
v_[i] - v_ref[i],
np.absolute((v_[i] - v_ref[i]) / v_ref[i]),
)
> assert np.allclose(v_, v_ref, rtol=1e-2, equal_nan=True), err_msg
E AssertionError: Inconsistent values: norm_score
E | OBS | REF | DIF | REL_DIF |
E | 4.445e+00 | 6.326e+00 | -1.881e+00 | 2.973e-01 |
E | 6.038e+00 | 5.916e+00 | 1.216e-01 | 2.056e-02 |
E | 4.697e+00 | 5.552e+00 | -8.552e-01 | 1.540e-01 |
E | 5.186e+00 | 7.299e+00 | -2.112e+00 | 2.894e-01 |
E | 6.072e+00 | 5.779e+00 | 2.927e-01 | 5.065e-02 |
E | -6.976e-01 | -5.614e-01 | -1.362e-01 | 2.427e-01 |
E | -1.192e+00 | -1.582e+00 | 3.897e-01 | 2.463e-01 |
E | -2.219e+00 | -2.312e+00 | 9.325e-02 | 4.033e-02 |
E | 1.216e+00 | 1.157e+00 | 5.952e-02 | 5.146e-02 |
E | -4.155e+00 | -3.166e+00 | -9.896e-01 | 3.126e-01 |
E | 2.262e+00 | 1.505e+00 | 7.576e-01 | 5.035e-01 |
E | -2.240e+00 | -3.798e+00 | 1.558e+00 | 4.102e-01 |
E | 7.692e-01 | 1.052e+00 | -2.824e-01 | 2.686e-01 |
E | 2.888e-01 | -1.237e-01 | 4.126e-01 | 3.334e+00 |
E | -4.752e-01 | -8.706e-01 | 3.954e-01 | 4.541e-01 |
E | -3.281e+00 | -3.768e+00 | 4.869e-01 | 1.292e-01 |
E | -1.792e+00 | -2.232e+00 | 4.397e-01 | 1.970e-01 |
E | -7.435e-01 | -6.558e-01 | -8.775e-02 | 1.338e-01 |
E | -3.577e-01 | -4.232e-01 | 6.545e-02 | 1.547e-01 |
E | -1.968e+00 | -2.191e+00 | 2.235e-01 | 1.020e-01 |
E | -3.799e-01 | -2.172e-01 | -1.626e-01 | 7.487e-01 |
E | 7.900e-02 | -1.761e-01 | 2.551e-01 | 1.449e+00 |
E | 8.555e-01 | 7.654e-01 | 9.011e-02 | 1.177e-01 |
E | -2.135e-01 | -3.305e-01 | 1.170e-01 | 3.541e-01 |
E | -1.905e+00 | -2.228e+00 | 3.232e-01 | 1.451e-01 |
E | -3.454e+00 | -2.705e+00 | -7.495e-01 | 2.771e-01 |
E | -2.037e+00 | -2.207e+00 | 1.692e-01 | 7.670e-02 |
E | -4.795e-01 | -3.563e-01 | -1.232e-01 | 3.458e-01 |
E | -2.691e+00 | -3.141e+00 | 4.506e-01 | 1.434e-01 |
E | -1.627e+00 | -2.306e+00 | 6.784e-01 | 2.942e-01 |
E
E assert False
E + where False = <function allclose at 0x7f7f9b24a5b0>(array([ 4.4454584 , 6.037902 , 4.6971283 , 5.186194 , 6.071957 ,\n -0.6976079 , -1.1924832 , -2.2186813 , ...900415, 0.8554982 , -0.21349816, -1.9051081 ,\n -3.4541266 , -2.037314 , -0.47953042, -2.690723 , -1.6272427 ]), array([ 6.3260064 , 5.916272 , 5.5523157 , 7.2986684 , 5.7792473 ,\n -0.5613674 , -1.5821338 , -2.3119287 , ...612725, 0.7653889 , -0.33054087, -2.228345 ,\n -2.7046354 , -2.2065454 , -0.35630605, -3.1413238 , -2.3056736 ]), rtol=0.01, equal_nan=True)
E + where <function allclose at 0x7f7f9b24a5b0> = np.allclose
tests/test_method_score_cell_main.py:76: AssertionError
------------------------------------------------------------------------------------------------------------ Captured stdout call ------------------------------------------------------------------------------------------------------------
******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.2
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.h5ad \
--h5ad-species mmusculus \
--cov-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.cov \
--gs-file /home/jul307/software/scDRS/scdrs/data/toydata_human.gs \
--gs-species hsapiens \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data False \
--flag-raw-count False \
--n-ctrl 20 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score False \
--out-folder /scratch/tmpyc6576ha
Loading data:
--h5ad-file loaded: n_cell=30, n_gene=2500 (sys_time=0.0s)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 genes: ['Pip4k2a', 'Chd7', 'Atp6v0c', 'Exoc3', 'Pex5']
--cov-file loaded: covariates=['covariate'] (sys_time=0.0s)
First 5 values for 'covariate': [10, 10, 10, 10, 10]
--gs-file loaded: n_trait=1 (sys_time=0.0s)
Print info for first 3 traits:
First 3 elements for 'toydata_gs_human': ['Mrps33', 'Cyp4f13', 'Kazald1'], [1.0, 1.0, 1.0]
Preprocessing:
Too few genes for 20*20 bins, setting n_mean_bin=n_var_bin=15
Computing scDRS score:
Trait=toydata_gs_human, n_gene=250: 6/30 FDR<0.1 cells, 6/30 FDR<0.2 cells (sys_time=0.3s)
******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.2
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.h5ad \
--h5ad-species mouse \
--cov-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.cov \
--gs-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.gs \
--gs-species mouse \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data False \
--flag-raw-count False \
--n-ctrl 20 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score False \
--out-folder /scratch/tmpyc6576ha
Loading data:
--h5ad-file loaded: n_cell=30, n_gene=2500 (sys_time=0.0s)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 genes: ['Pip4k2a', 'Chd7', 'Atp6v0c', 'Exoc3', 'Pex5']
--cov-file loaded: covariates=['covariate'] (sys_time=0.0s)
First 5 values for 'covariate': [10, 10, 10, 10, 10]
--gs-file loaded: n_trait=1 (sys_time=0.0s)
Print info for first 3 traits:
First 3 elements for 'toydata_gs_mouse': ['Mrps33', 'Cyp4f13', 'Kazald1'], [1.0, 1.0, 1.0]
Preprocessing:
Too few genes for 20*20 bins, setting n_mean_bin=n_var_bin=15
Computing scDRS score:
Trait=toydata_gs_mouse, n_gene=250: 6/30 FDR<0.1 cells, 6/30 FDR<0.2 cells (sys_time=0.3s)
------------------------------------------------------------------------------------------------------------ Captured stderr call ------------------------------------------------------------------------------------------------------------
Computing control scores: 100%|██████████| 20/20 [00:00<00:00, 284.12it/s]
Computing control scores: 100%|██████████| 20/20 [00:00<00:00, 288.89it/s]
========================================================================================================== short test summary info ===========================================================================================================
FAILED tests/test_CLI.py::test_score_cell_cli - AssertionError: Inconsistent values: norm_score
======================================================================================================== 1 failed, 2 passed in 40.71s ========================================================================================================
Hi, v1.0.3 is in the main branch. We may have updated the test data. Can you install from the main branch and run the tests again?
Same error with v1.0.3:
$ python -m pytest tests/test_CLI.py -p no:warnings
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/jul307/software/scDRS
configfile: pyproject.toml
collected 3 items
tests/test_CLI.py F.. [100%]
================================================================================================================== FAILURES ==================================================================================================================
____________________________________________________________________________________________________________ test_score_cell_cli _____________________________________________________________________________________________________________
def test_score_cell_cli():
"""
Test CLI `scdrs compute-score`
"""
# Load toy data
ROOT_DIR = scdrs.__path__[0]
H5AD_FILE = os.path.join(ROOT_DIR, "data/toydata_mouse.h5ad")
COV_FILE = os.path.join(ROOT_DIR, "data/toydata_mouse.cov")
assert os.path.exists(H5AD_FILE), "built-in data toydata_mouse.h5ad missing"
assert os.path.exists(COV_FILE), "built-in data toydata_mouse.cov missing"
tmp_dir = tempfile.TemporaryDirectory()
tmp_dir_path = tmp_dir.name
dict_df_score = {}
for gs_species in ["human", "mouse"]:
gs_file = os.path.join(ROOT_DIR, f"data/toydata_{gs_species}.gs")
# call compute_score.py
cmds = [
f"scdrs compute-score",
f"--h5ad_file {H5AD_FILE}",
"--h5ad_species mouse",
f"--gs_file {gs_file}",
f"--gs_species {gs_species}",
f"--cov_file {COV_FILE}",
"--ctrl_match_opt mean_var",
"--n_ctrl 20",
"--flag_filter_data False",
"--weight_opt vs",
"--flag_raw_count False",
"--flag_return_ctrl_raw_score False",
"--flag_return_ctrl_norm_score False",
f"--out_folder {tmp_dir_path}",
]
subprocess.check_call(" ".join(cmds), shell=True)
dict_df_score[gs_species] = pd.read_csv(
os.path.join(tmp_dir_path, f"toydata_gs_{gs_species}.score.gz"),
sep="\t",
index_col=0,
)
# consistency between human and mouse
assert np.all(dict_df_score["mouse"].pval == dict_df_score["human"].pval)
df_res = dict_df_score["mouse"]
REF_COV_FILE = os.path.join(
ROOT_DIR, "data/toydata_gs_mouse.ref_Ctrl20_CovConstCovariate.score.gz"
)
df_ref_res = pd.read_csv(REF_COV_FILE, sep="\t", index_col=0)
> compare_score_file(df_res, df_ref_res)
tests/test_CLI.py:58:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
df_res = raw_score norm_score mc_pval pval nlog10_pval zscore
index ...00 -10.000000
J10_B003899_S130.mus-7-0-1 4.460493 -1.627243 1.000000 0.956739 0.019207 -1.714034
df_res_ref = raw_score norm_score mc_pval pval nlog10_pval zscore
index ...00 -10.000000
J10_B003899_S130.mus-7-0-1 4.460493 -2.305674 1.000000 0.991680 0.003628 -2.394591
def compare_score_file(df_res, df_res_ref):
"""
Compare df_res
"""
col_list = ["raw_score", "norm_score", "mc_pval", "pval"]
for col in col_list:
v_ = df_res[col].values
v_ref = df_res_ref[col].values
err_msg = "Inconsistent values: {}\n".format(col)
err_msg += "|{:^15}|{:^15}|{:^15}|{:^15}|\n".format(
"OBS", "REF", "DIF", "REL_DIF"
)
for i in range(v_.shape[0]):
err_msg += "|{:^15.3e}|{:^15.3e}|{:^15.3e}|{:^15.3e}|\n".format(
v_[i],
v_ref[i],
v_[i] - v_ref[i],
np.absolute((v_[i] - v_ref[i]) / v_ref[i]),
)
> assert np.allclose(v_, v_ref, rtol=1e-2, equal_nan=True), err_msg
E AssertionError: Inconsistent values: norm_score
E | OBS | REF | DIF | REL_DIF |
E | 4.445e+00 | 6.326e+00 | -1.881e+00 | 2.973e-01 |
E | 6.038e+00 | 5.916e+00 | 1.216e-01 | 2.056e-02 |
E | 4.697e+00 | 5.552e+00 | -8.552e-01 | 1.540e-01 |
E | 5.186e+00 | 7.299e+00 | -2.112e+00 | 2.894e-01 |
E | 6.072e+00 | 5.779e+00 | 2.927e-01 | 5.065e-02 |
E | -6.976e-01 | -5.614e-01 | -1.362e-01 | 2.427e-01 |
E | -1.192e+00 | -1.582e+00 | 3.897e-01 | 2.463e-01 |
E | -2.219e+00 | -2.312e+00 | 9.325e-02 | 4.033e-02 |
E | 1.216e+00 | 1.157e+00 | 5.952e-02 | 5.146e-02 |
E | -4.155e+00 | -3.166e+00 | -9.896e-01 | 3.126e-01 |
E | 2.262e+00 | 1.505e+00 | 7.576e-01 | 5.035e-01 |
E | -2.240e+00 | -3.798e+00 | 1.558e+00 | 4.102e-01 |
E | 7.692e-01 | 1.052e+00 | -2.824e-01 | 2.686e-01 |
E | 2.888e-01 | -1.237e-01 | 4.126e-01 | 3.334e+00 |
E | -4.752e-01 | -8.706e-01 | 3.954e-01 | 4.541e-01 |
E | -3.281e+00 | -3.768e+00 | 4.869e-01 | 1.292e-01 |
E | -1.792e+00 | -2.232e+00 | 4.397e-01 | 1.970e-01 |
E | -7.435e-01 | -6.558e-01 | -8.775e-02 | 1.338e-01 |
E | -3.577e-01 | -4.232e-01 | 6.545e-02 | 1.547e-01 |
E | -1.968e+00 | -2.191e+00 | 2.235e-01 | 1.020e-01 |
E | -3.799e-01 | -2.172e-01 | -1.626e-01 | 7.487e-01 |
E | 7.900e-02 | -1.761e-01 | 2.551e-01 | 1.449e+00 |
E | 8.555e-01 | 7.654e-01 | 9.011e-02 | 1.177e-01 |
E | -2.135e-01 | -3.305e-01 | 1.170e-01 | 3.541e-01 |
E | -1.905e+00 | -2.228e+00 | 3.232e-01 | 1.451e-01 |
E | -3.454e+00 | -2.705e+00 | -7.495e-01 | 2.771e-01 |
E | -2.037e+00 | -2.207e+00 | 1.692e-01 | 7.670e-02 |
E | -4.795e-01 | -3.563e-01 | -1.232e-01 | 3.458e-01 |
E | -2.691e+00 | -3.141e+00 | 4.506e-01 | 1.434e-01 |
E | -1.627e+00 | -2.306e+00 | 6.784e-01 | 2.942e-01 |
E
E assert False
E + where False = <function allclose at 0x7f4a28366270>(array([ 4.4454584 , 6.037902 , 4.6971283 , 5.186194 , 6.071957 ,\n -0.6976079 , -1.1924832 , -2.2186813 , ...900415, 0.8554982 , -0.21349816, -1.9051081 ,\n -3.4541266 , -2.037314 , -0.47953042, -2.690723 , -1.6272427 ]), array([ 6.3260064 , 5.916272 , 5.5523157 , 7.2986684 , 5.7792473 ,\n -0.5613674 , -1.5821338 , -2.3119287 , ...612725, 0.7653889 , -0.33054087, -2.228345 ,\n -2.7046354 , -2.2065454 , -0.35630605, -3.1413238 , -2.3056736 ]), rtol=0.01, equal_nan=True)
E + where <function allclose at 0x7f4a28366270> = np.allclose
tests/test_method_score_cell_main.py:76: AssertionError
------------------------------------------------------------------------------------------------------------ Captured stdout call ------------------------------------------------------------------------------------------------------------
******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.3
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.h5ad \
--h5ad-species mmusculus \
--cov-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.cov \
--gs-file /home/jul307/software/scDRS/scdrs/data/toydata_human.gs \
--gs-species hsapiens \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data False \
--flag-raw-count False \
--n-ctrl 20 \
--min-genes 250 \
--min-cells 50 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score False \
--out-folder /scratch/tmpggtt845u
Loading data:
--h5ad-file loaded: n_cell=30, n_gene=2500 (sys_time=0.1s)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 genes: ['Pip4k2a', 'Chd7', 'Atp6v0c', 'Exoc3', 'Pex5']
--cov-file loaded: covariates=['covariate'] (sys_time=0.1s)
n_cell=30 (30 in .h5ad)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 values for 'covariate': [10, 10, 10, 10, 10]
--gs-file loaded: n_trait=1 (sys_time=0.1s)
Print info for first 3 traits:
First 3 elements for 'toydata_gs_human': ['Mrps33', 'Cyp4f13', 'Kazald1'], [1.0, 1.0, 1.0]
Preprocessing:
Too few genes for 20*20 bins, setting n_mean_bin=n_var_bin=15
Computing scDRS score:
Trait=toydata_gs_human, n_gene=250: 6/30 FDR<0.1 cells, 6/30 FDR<0.2 cells (sys_time=0.4s)
******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.3
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.h5ad \
--h5ad-species mouse \
--cov-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.cov \
--gs-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.gs \
--gs-species mouse \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data False \
--flag-raw-count False \
--n-ctrl 20 \
--min-genes 250 \
--min-cells 50 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score False \
--out-folder /scratch/tmpggtt845u
Loading data:
--h5ad-file loaded: n_cell=30, n_gene=2500 (sys_time=0.0s)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 genes: ['Pip4k2a', 'Chd7', 'Atp6v0c', 'Exoc3', 'Pex5']
--cov-file loaded: covariates=['covariate'] (sys_time=0.0s)
n_cell=30 (30 in .h5ad)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 values for 'covariate': [10, 10, 10, 10, 10]
--gs-file loaded: n_trait=1 (sys_time=0.0s)
Print info for first 3 traits:
First 3 elements for 'toydata_gs_mouse': ['Mrps33', 'Cyp4f13', 'Kazald1'], [1.0, 1.0, 1.0]
Preprocessing:
Too few genes for 20*20 bins, setting n_mean_bin=n_var_bin=15
Computing scDRS score:
Trait=toydata_gs_mouse, n_gene=250: 6/30 FDR<0.1 cells, 6/30 FDR<0.2 cells (sys_time=0.3s)
------------------------------------------------------------------------------------------------------------ Captured stderr call ------------------------------------------------------------------------------------------------------------
Computing control scores: 100%|██████████| 20/20 [00:00<00:00, 272.68it/s]
Computing control scores: 100%|██████████| 20/20 [00:00<00:00, 286.57it/s]
========================================================================================================== short test summary info ===========================================================================================================
FAILED tests/test_CLI.py::test_score_cell_cli - AssertionError: Inconsistent values: norm_score
======================================================================================================== 1 failed, 2 passed in 37.78s ========================================================================================================
I've also tried scDRS v.1.0.3 with multiple versions of Python (3.8-3.12), and the test only passed with Python 3.8 for some reason:
python -m pytest tests/test_CLI.py -p no:warnings
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.8.19, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/jul307/software/scDRS
configfile: pyproject.toml
plugins: anyio-3.7.1
collected 3 items
tests/test_CLI.py ... [100%]
============================================================================================================= 3 passed in 46.72s =============================================================================================================
Somewhat strangely, I couldn't replicate this error using either python 3.9 / 3.10.
For example in https://colab.google/ (3.10)
!python --version
!pip install git+https://github.com/martinjzhang/scDRS.git
import os
import pandas as pd
import scdrs
DATA_PATH = scdrs.__path__[0]
H5AD_FILE = os.path.join(DATA_PATH, "data/toydata_mouse.h5ad")
COV_FILE = os.path.join(DATA_PATH, "data/toydata_mouse.cov")
GS_FILE = os.path.join(DATA_PATH, "data/toydata_mouse.gs")
# Load .h5ad file, .cov file, and .gs file
adata = scdrs.util.load_h5ad(H5AD_FILE, flag_filter_data=False, flag_raw_count=False)
df_cov = pd.read_csv(COV_FILE, sep="\t", index_col=0)
df_gs = scdrs.util.load_gs(GS_FILE)
# Preproecssing .h5ad data compute scDRS score
scdrs.preprocess(adata, cov=df_cov)
gene_list = df_gs['toydata_gs_mouse'][0]
gene_weight = df_gs['toydata_gs_mouse'][1]
df_res = scdrs.score_cell(adata, gene_list, gene_weight=gene_weight, n_ctrl=20)
print(df_res.iloc[:4])
Strange indeed... Maybe something is wrong with my conda. But I can't think of any reason why only the norm_score
is affected and why this is Python version-dependent.
Thanks for the efforts in pinpointing the issue. I'm closing this for now unless someone else runs into this. But I'd recommend updating the installation instructions in the tutorial to v.1.0.3.
I replicated this issue (with the exact norm_score
values as @hoholee's) using conda + py39 on a local HPC. This might be a Python version issue. I will look into this matter further.
Fixed. The issue is due to a small discrepancy between different pandas versions. https://github.com/martinjzhang/scDRS/pull/85