fix bug for partial_fit when the number of batch samples is less than n_comp
For incremental PCA: sc.tl.pca(adata, n_comps=ndim, chunked=True)
sometimes, the number of samples for the last chunk is smaller than ndim, an error would be throw:
File /anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/pym3c/clustering.py:377, in run_dimension_reduction(***failed resolving arguments***)
375 if not downsample or obs_chunk_size > downsample or adata.n_obs < downsample:
376 logger.info(f"Running IncrementalPCA without downsampling")
--> 377 sc.tl.pca(adata, n_comps=ndim, chunked=True,
378 chunk_size=obs_chunk_size)
379 else: # downsample
380 logger.info(f"Running IncrementalPCA with downsample = {downsample}")
File /anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/scanpy/preprocessing/_pca.py:255, in pca(***failed resolving arguments***)
253 for chunk, _, _ in adata_comp.chunked_X(chunk_size):
254 chunk = chunk.toarray() if issparse(chunk) else chunk
--> 255 pca_.partial_fit(chunk)
257 for chunk, start, end in adata_comp.chunked_X(chunk_size):
258 chunk = chunk.toarray() if issparse(chunk) else chunk
File /anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/sklearn/base.py:1473, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
1466 estimator._validate_params()
1468 with config_context(
1469 skip_parameter_validation=(
1470 prefer_skip_nested_validation or global_skip_validation
1471 )
1472 ):
-> 1473 return fit_method(estimator, *args, **kwargs)
File /anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/sklearn/decomposition/_incremental_pca.py:304, in IncrementalPCA.partial_fit(self, X, y, check_input)
298 raise ValueError(
299 "n_components=%r invalid for n_features=%d, need "
300 "more rows than columns for IncrementalPCA "
301 "processing" % (self.n_components, n_features)
302 )
303 elif not self.n_components <= n_samples:
--> 304 raise ValueError(
305 "n_components=%r must be less or equal to "
306 "the batch number of samples "
307 "%d." % (self.n_components, n_samples)
308 )
309 else:
310 self.n_components_ = self.n_components
ValueError: n_components=100 must be less or equal to the batch number of samples 77
To fix this bug, I added a try and except to line 255 of _pca.py.
- [ ] Closes #
- [x] Tests included or not required because:
- [ ] Release notes not necessary because:
Codecov Report
Attention: Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.
Project coverage is 76.62%. Comparing base (
bec794c) to head (8d1cb04). Report is 35 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/scanpy/preprocessing/_pca.py | 50.00% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #3227 +/- ##
==========================================
- Coverage 76.63% 76.62% -0.02%
==========================================
Files 109 109
Lines 12533 12536 +3
==========================================
+ Hits 9605 9606 +1
- Misses 2928 2930 +2
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/scanpy/preprocessing/_pca.py | 92.47% <50.00%> (-0.97%) |
:arrow_down: |
Hi, can you please create an issue with a minimal reproducible example?
Alternatively please add a unit test that will trigger your newly added branch. You’ll be able to see if that worked when this comment goes away:
Lastly, please follow the pre-commit instructions:
src/scanpy/preprocessing/_pca.py:268:13: E722 Do not use bare `except` | 266 | try: 267 | pca_.partial_fit(chunk) 268 | except: | ^^^^^^ E722 269 | continue |Found 1 error.