scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

fix bug for partial_fit when the number of batch samples is less than n_comp

Open DingWB opened this issue 1 year ago • 2 comments

For incremental PCA: sc.tl.pca(adata, n_comps=ndim, chunked=True) sometimes, the number of samples for the last chunk is smaller than ndim, an error would be throw:

File /anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/pym3c/clustering.py:377, in run_dimension_reduction(***failed resolving arguments***)
    375 if not downsample or obs_chunk_size > downsample or adata.n_obs < downsample:
    376         logger.info(f"Running IncrementalPCA without downsampling")
--> 377         sc.tl.pca(adata, n_comps=ndim, chunked=True,
    378                           chunk_size=obs_chunk_size)
    379 else: # downsample
    380         logger.info(f"Running IncrementalPCA with downsample = {downsample}")

File /anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/scanpy/preprocessing/_pca.py:255, in pca(***failed resolving arguments***)
    253 for chunk, _, _ in adata_comp.chunked_X(chunk_size):
    254     chunk = chunk.toarray() if issparse(chunk) else chunk
--> 255     pca_.partial_fit(chunk)
    257 for chunk, start, end in adata_comp.chunked_X(chunk_size):
    258     chunk = chunk.toarray() if issparse(chunk) else chunk

File /anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/sklearn/base.py:1473, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
   1466     estimator._validate_params()
   1468 with config_context(
   1469     skip_parameter_validation=(
   1470         prefer_skip_nested_validation or global_skip_validation
   1471     )
   1472 ):
-> 1473     return fit_method(estimator, *args, **kwargs)

File /anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/sklearn/decomposition/_incremental_pca.py:304, in IncrementalPCA.partial_fit(self, X, y, check_input)
    298     raise ValueError(
    299         "n_components=%r invalid for n_features=%d, need "
    300         "more rows than columns for IncrementalPCA "
    301         "processing" % (self.n_components, n_features)
    302     )
    303 elif not self.n_components <= n_samples:
--> 304     raise ValueError(
    305         "n_components=%r must be less or equal to "
    306         "the batch number of samples "
    307         "%d." % (self.n_components, n_samples)
    308     )
    309 else:
    310     self.n_components_ = self.n_components

ValueError: n_components=100 must be less or equal to the batch number of samples 77

To fix this bug, I added a try and except to line 255 of _pca.py.

  • [ ] Closes #
  • [x] Tests included or not required because:
  • [ ] Release notes not necessary because:

DingWB avatar Sep 12 '24 16:09 DingWB

Codecov Report

Attention: Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 76.62%. Comparing base (bec794c) to head (8d1cb04). Report is 35 commits behind head on main.

Files with missing lines Patch % Lines
src/scanpy/preprocessing/_pca.py 50.00% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3227      +/-   ##
==========================================
- Coverage   76.63%   76.62%   -0.02%     
==========================================
  Files         109      109              
  Lines       12533    12536       +3     
==========================================
+ Hits         9605     9606       +1     
- Misses       2928     2930       +2     
Files with missing lines Coverage Δ
src/scanpy/preprocessing/_pca.py 92.47% <50.00%> (-0.97%) :arrow_down:

codecov[bot] avatar Sep 12 '24 17:09 codecov[bot]

Hi, can you please create an issue with a minimal reproducible example?

Alternatively please add a unit test that will trigger your newly added branch. You’ll be able to see if that worked when this comment goes away:

grafik

Lastly, please follow the pre-commit instructions:

src/scanpy/preprocessing/_pca.py:268:13: E722 Do not use bare `except`
    |
266 |             try:
267 |                 pca_.partial_fit(chunk)
268 |             except:
    |             ^^^^^^ E722
269 |                 continue
    |

Found 1 error.

flying-sheep avatar Sep 24 '24 12:09 flying-sheep