Scale
This contribution holds modifications to the existing 'scale function' which turns out to be a faster implementation than the original one. We have introduced flavors - default and use_fastpp , where use_fastpp is our faster implementation of the scale function.The scale function modification provides an overall speedup of approx ~2x in comparison to the already existing 'scale' function. Usage : sc.pp.scale(adata,max_value=10,flavor='use_fastpp')
Codecov Report
Merging #2457 (e749022) into master (f03c5b4) will decrease coverage by
0.12%. The diff coverage is25.00%.
Additional details and impacted files
@@ Coverage Diff @@
## master #2457 +/- ##
==========================================
- Coverage 71.88% 71.76% -0.12%
==========================================
Files 98 98
Lines 11510 11537 +27
==========================================
+ Hits 8274 8280 +6
- Misses 3236 3257 +21
| Impacted Files | Coverage Δ | |
|---|---|---|
| scanpy/preprocessing/_simple.py | 74.59% <25.00%> (-3.52%) |
:arrow_down: |
@Sakshi-2797 is there a reason why you'd introduce a flavor here instead of outright replacing the implementation?
Hi @Zethson , The reason why we introduced flavors here is that we wanted the traditional implementation to be present in case anyone wanted to use it. We just introduced our implementation as a faster alternative to the already available one. In case replacing the code is required, we can do that as well.
Unfortunately, I run into
__________________________________________________________________________________ test_scale[use_fastpp] ___________________________________________________________________________________
flavor = 'use_fastpp'
@pytest.mark.parametrize("flavor", ["default", "use_fastpp"])
def test_scale(flavor):
adata = pbmc68k_reduced()
adata.X = adata.raw.X
v = adata[:, 0 : adata.shape[1] // 2]
# Should turn view to copy https://github.com/scverse/anndata/issues/171#issuecomment-508689965
assert v.is_view
with pytest.warns(Warning, match="view"):
> sc.pp.scale(v, flavor=flavor)
scanpy/tests/test_preprocessing.py:127:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../miniconda3/envs/scanpy/lib/python3.9/functools.py:888: in wrapper
return dispatch(args[0].__class__)(*args, **kw)
scanpy/preprocessing/_simple.py:888: in scale_anndata
X, adata.var["mean"], adata.var["std"] = do_scale(
../../miniconda3/envs/scanpy/lib/python3.9/site-packages/numba/core/dispatcher.py:468: in _compile_for_args
error_rewrite(e, 'typing')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
e = TypingError('Failed in nopython mode pipeline (step: nopython frontend)\nnon-precise type pyobject\nDuring: typing of ...y the following argument(s):\n- argument 0: Cannot determine Numba type of <class \'scipy.sparse._csr.csr_matrix\'>\n')
issue_type = 'typing'
def error_rewrite(e, issue_type):
"""
Rewrite and raise Exception `e` with help supplied based on the
specified issue_type.
"""
if config.SHOW_HELP:
help_msg = errors.error_extras[issue_type]
e.patch_message('\n'.join((str(e).rstrip(), help_msg)))
if config.FULL_TRACEBACKS:
raise e
else:
> raise e.with_traceback(None)
E numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
E non-precise type pyobject
E During: typing of argument at /home/zeth/PycharmProjects/scanpy/scanpy/preprocessing/_simple.py (763)
E
E File "scanpy/preprocessing/_simple.py", line 763:
E def do_scale(X, maxv, nthr):
E <source elided>
E # t0= time.time()
E s = np.zeros((nthr, X.shape[1]))
E ^
E
E This error may have been caused by the following argument(s):
E - argument 0: Cannot determine Numba type of <class 'scipy.sparse._csr.csr_matrix'>
../../miniconda3/envs/scanpy/lib/python3.9/site-packages/numba/core/dispatcher.py:409: TypingError
When trying to use the new flavor with the existing test.
Unfortunately, I run into
__________________________________________________________________________________ test_scale[use_fastpp] ___________________________________________________________________________________ flavor = 'use_fastpp' @pytest.mark.parametrize("flavor", ["default", "use_fastpp"]) def test_scale(flavor): adata = pbmc68k_reduced() adata.X = adata.raw.X v = adata[:, 0 : adata.shape[1] // 2] # Should turn view to copy https://github.com/scverse/anndata/issues/171#issuecomment-508689965 assert v.is_view with pytest.warns(Warning, match="view"): > sc.pp.scale(v, flavor=flavor) scanpy/tests/test_preprocessing.py:127: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../miniconda3/envs/scanpy/lib/python3.9/functools.py:888: in wrapper return dispatch(args[0].__class__)(*args, **kw) scanpy/preprocessing/_simple.py:888: in scale_anndata X, adata.var["mean"], adata.var["std"] = do_scale( ../../miniconda3/envs/scanpy/lib/python3.9/site-packages/numba/core/dispatcher.py:468: in _compile_for_args error_rewrite(e, 'typing') _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ e = TypingError('Failed in nopython mode pipeline (step: nopython frontend)\nnon-precise type pyobject\nDuring: typing of ...y the following argument(s):\n- argument 0: Cannot determine Numba type of <class \'scipy.sparse._csr.csr_matrix\'>\n') issue_type = 'typing' def error_rewrite(e, issue_type): """ Rewrite and raise Exception `e` with help supplied based on the specified issue_type. """ if config.SHOW_HELP: help_msg = errors.error_extras[issue_type] e.patch_message('\n'.join((str(e).rstrip(), help_msg))) if config.FULL_TRACEBACKS: raise e else: > raise e.with_traceback(None) E numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) E non-precise type pyobject E During: typing of argument at /home/zeth/PycharmProjects/scanpy/scanpy/preprocessing/_simple.py (763) E E File "scanpy/preprocessing/_simple.py", line 763: E def do_scale(X, maxv, nthr): E <source elided> E # t0= time.time() E s = np.zeros((nthr, X.shape[1])) E ^ E E This error may have been caused by the following argument(s): E - argument 0: Cannot determine Numba type of <class 'scipy.sparse._csr.csr_matrix'> ../../miniconda3/envs/scanpy/lib/python3.9/site-packages/numba/core/dispatcher.py:409: TypingErrorWhen trying to use the new flavor with the existing test.
Hi @Zethson , We are not able to see this issue with the latest commit. Can you please retry with the latest commit in scale branch.
Please adapt the corresponding test to:
@pytest.mark.parametrize("flavor", ["default", "use_fastpp"])
def test_scale(flavor):
adata = pbmc68k_reduced()
adata.X = adata.raw.X
v = adata[:, 0 : adata.shape[1] // 2]
# Should turn view to copy https://github.com/scverse/anndata/issues/171#issuecomment-508689965
assert v.is_view
with pytest.warns(Warning, match="view"):
sc.pp.scale(v, flavor=flavor)
assert not v.is_view
assert_allclose(v.X.var(axis=0), np.ones(v.shape[1]), atol=0.01)
assert_allclose(v.X.mean(axis=0), np.zeros(v.shape[1]), atol=0.00001)
It fails for me with FAILED scanpy/tests/test_preprocessing.py::test_scale[use_fastpp] - numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
I pushed to your branch. It failed yesterday while Github was having issues. The test should pass.
@Sakshi-2797 were you able to find the fix to make it work or do you need more information? I'm happy to take over your branch as soon as it works to glue things together
@pytest.mark.parametrize("flavor", ["default", "use_fastpp"])
Hi @Zethson , I tried running the modified testcase mentioned above , but it seems it is failing because sparse matrix is being passed in it as a parameter. As of now, our scale function is not implemented for the sparse matrices. It is expected that these tests will fail.
Sparse matrices are absolutely essential for single-cell to ensure that RAM usage is kept low. Is there a way to make your implementation work with sparse matrices?
Sorry for the delay in responding here. Our resources got pulled into other projects. We are looking to make it work with sparse matrices.