scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

Scale

Open Sakshi-2797 opened this issue 2 years ago • 11 comments

This contribution holds modifications to the existing 'scale function' which turns out to be a faster implementation than the original one. We have introduced flavors - default and use_fastpp , where use_fastpp is our faster implementation of the scale function.The scale function modification provides an overall speedup of approx ~2x in comparison to the already existing 'scale' function. Usage : sc.pp.scale(adata,max_value=10,flavor='use_fastpp')

Sakshi-2797 avatar Mar 24 '23 10:03 Sakshi-2797

Codecov Report

Merging #2457 (e749022) into master (f03c5b4) will decrease coverage by 0.12%. The diff coverage is 25.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2457      +/-   ##
==========================================
- Coverage   71.88%   71.76%   -0.12%     
==========================================
  Files          98       98              
  Lines       11510    11537      +27     
==========================================
+ Hits         8274     8280       +6     
- Misses       3236     3257      +21     
Impacted Files Coverage Δ
scanpy/preprocessing/_simple.py 74.59% <25.00%> (-3.52%) :arrow_down:

codecov[bot] avatar Mar 24 '23 11:03 codecov[bot]

@Sakshi-2797 is there a reason why you'd introduce a flavor here instead of outright replacing the implementation?

Zethson avatar Apr 17 '23 14:04 Zethson

Hi @Zethson , The reason why we introduced flavors here is that we wanted the traditional implementation to be present in case anyone wanted to use it. We just introduced our implementation as a faster alternative to the already available one. In case replacing the code is required, we can do that as well.

Sakshi-2797 avatar Apr 17 '23 15:04 Sakshi-2797

Unfortunately, I run into

__________________________________________________________________________________ test_scale[use_fastpp] ___________________________________________________________________________________

flavor = 'use_fastpp'

    @pytest.mark.parametrize("flavor", ["default", "use_fastpp"])
    def test_scale(flavor):
        adata = pbmc68k_reduced()
        adata.X = adata.raw.X
        v = adata[:, 0 : adata.shape[1] // 2]
        # Should turn view to copy https://github.com/scverse/anndata/issues/171#issuecomment-508689965
        assert v.is_view
        with pytest.warns(Warning, match="view"):
>           sc.pp.scale(v, flavor=flavor)

scanpy/tests/test_preprocessing.py:127: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../miniconda3/envs/scanpy/lib/python3.9/functools.py:888: in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
scanpy/preprocessing/_simple.py:888: in scale_anndata
    X, adata.var["mean"], adata.var["std"] = do_scale(
../../miniconda3/envs/scanpy/lib/python3.9/site-packages/numba/core/dispatcher.py:468: in _compile_for_args
    error_rewrite(e, 'typing')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

e = TypingError('Failed in nopython mode pipeline (step: nopython frontend)\nnon-precise type pyobject\nDuring: typing of ...y the following argument(s):\n- argument 0: Cannot determine Numba type of <class \'scipy.sparse._csr.csr_matrix\'>\n')
issue_type = 'typing'

    def error_rewrite(e, issue_type):
        """
        Rewrite and raise Exception `e` with help supplied based on the
        specified issue_type.
        """
        if config.SHOW_HELP:
            help_msg = errors.error_extras[issue_type]
            e.patch_message('\n'.join((str(e).rstrip(), help_msg)))
        if config.FULL_TRACEBACKS:
            raise e
        else:
>           raise e.with_traceback(None)
E           numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
E           non-precise type pyobject
E           During: typing of argument at /home/zeth/PycharmProjects/scanpy/scanpy/preprocessing/_simple.py (763)
E           
E           File "scanpy/preprocessing/_simple.py", line 763:
E           def do_scale(X, maxv, nthr):
E               <source elided>
E               # t0= time.time()
E               s = np.zeros((nthr, X.shape[1]))
E               ^ 
E           
E           This error may have been caused by the following argument(s):
E           - argument 0: Cannot determine Numba type of <class 'scipy.sparse._csr.csr_matrix'>

../../miniconda3/envs/scanpy/lib/python3.9/site-packages/numba/core/dispatcher.py:409: TypingError

When trying to use the new flavor with the existing test.

Zethson avatar May 03 '23 16:05 Zethson

Unfortunately, I run into

__________________________________________________________________________________ test_scale[use_fastpp] ___________________________________________________________________________________

flavor = 'use_fastpp'

    @pytest.mark.parametrize("flavor", ["default", "use_fastpp"])
    def test_scale(flavor):
        adata = pbmc68k_reduced()
        adata.X = adata.raw.X
        v = adata[:, 0 : adata.shape[1] // 2]
        # Should turn view to copy https://github.com/scverse/anndata/issues/171#issuecomment-508689965
        assert v.is_view
        with pytest.warns(Warning, match="view"):
>           sc.pp.scale(v, flavor=flavor)

scanpy/tests/test_preprocessing.py:127: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../miniconda3/envs/scanpy/lib/python3.9/functools.py:888: in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
scanpy/preprocessing/_simple.py:888: in scale_anndata
    X, adata.var["mean"], adata.var["std"] = do_scale(
../../miniconda3/envs/scanpy/lib/python3.9/site-packages/numba/core/dispatcher.py:468: in _compile_for_args
    error_rewrite(e, 'typing')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

e = TypingError('Failed in nopython mode pipeline (step: nopython frontend)\nnon-precise type pyobject\nDuring: typing of ...y the following argument(s):\n- argument 0: Cannot determine Numba type of <class \'scipy.sparse._csr.csr_matrix\'>\n')
issue_type = 'typing'

    def error_rewrite(e, issue_type):
        """
        Rewrite and raise Exception `e` with help supplied based on the
        specified issue_type.
        """
        if config.SHOW_HELP:
            help_msg = errors.error_extras[issue_type]
            e.patch_message('\n'.join((str(e).rstrip(), help_msg)))
        if config.FULL_TRACEBACKS:
            raise e
        else:
>           raise e.with_traceback(None)
E           numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
E           non-precise type pyobject
E           During: typing of argument at /home/zeth/PycharmProjects/scanpy/scanpy/preprocessing/_simple.py (763)
E           
E           File "scanpy/preprocessing/_simple.py", line 763:
E           def do_scale(X, maxv, nthr):
E               <source elided>
E               # t0= time.time()
E               s = np.zeros((nthr, X.shape[1]))
E               ^ 
E           
E           This error may have been caused by the following argument(s):
E           - argument 0: Cannot determine Numba type of <class 'scipy.sparse._csr.csr_matrix'>

../../miniconda3/envs/scanpy/lib/python3.9/site-packages/numba/core/dispatcher.py:409: TypingError

When trying to use the new flavor with the existing test.

Hi @Zethson , We are not able to see this issue with the latest commit. Can you please retry with the latest commit in scale branch.

Sakshi-2797 avatar May 09 '23 11:05 Sakshi-2797

Please adapt the corresponding test to:

@pytest.mark.parametrize("flavor", ["default", "use_fastpp"])
def test_scale(flavor):
    adata = pbmc68k_reduced()
    adata.X = adata.raw.X
    v = adata[:, 0 : adata.shape[1] // 2]
    # Should turn view to copy https://github.com/scverse/anndata/issues/171#issuecomment-508689965
    assert v.is_view
    with pytest.warns(Warning, match="view"):
        sc.pp.scale(v, flavor=flavor)
    assert not v.is_view
    assert_allclose(v.X.var(axis=0), np.ones(v.shape[1]), atol=0.01)
    assert_allclose(v.X.mean(axis=0), np.zeros(v.shape[1]), atol=0.00001)

It fails for me with FAILED scanpy/tests/test_preprocessing.py::test_scale[use_fastpp] - numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)

Zethson avatar May 09 '23 11:05 Zethson

I pushed to your branch. It failed yesterday while Github was having issues. The test should pass.

Zethson avatar May 10 '23 07:05 Zethson

@Sakshi-2797 were you able to find the fix to make it work or do you need more information? I'm happy to take over your branch as soon as it works to glue things together

Zethson avatar May 24 '23 12:05 Zethson

@pytest.mark.parametrize("flavor", ["default", "use_fastpp"])

Hi @Zethson , I tried running the modified testcase mentioned above , but it seems it is failing because sparse matrix is being passed in it as a parameter. As of now, our scale function is not implemented for the sparse matrices. It is expected that these tests will fail.

Sakshi-2797 avatar May 29 '23 08:05 Sakshi-2797

Sparse matrices are absolutely essential for single-cell to ensure that RAM usage is kept low. Is there a way to make your implementation work with sparse matrices?

Zethson avatar May 30 '23 07:05 Zethson

Sorry for the delay in responding here. Our resources got pulled into other projects. We are looking to make it work with sparse matrices.

sanchit-misra avatar Dec 07 '23 05:12 sanchit-misra