KeyError: 'clusters'
Describe the bug
when follow the tutorial bulk2single, the function ov.bulk2single.bulk2single_plot_cellprop raise a KeyError: 'clusters'.
To Reproduce
command: ov.bulk2single.bulk2single_plot_cellprop(generate_adata,celltype_key='clusters')
errors: ----> 1 ov.bulk2single.bulk2single_plot_cellprop(generate_adata,celltype_key='clusters') File /data/miniforge3/envs/ov-gpu/lib/python3.10/site-packages/omicverse/bulk2single/_utils.py:69, in bulk2single_plot_cellprop(generate_single_data, celltype_key, figsize) 67 key_name=list(generate_single_data.obs[celltype_key].cat.categories) 68 ct_name = list(ct_stat.index) ---> 69 ct_num = list(ct_stat[celltype_key]) 70 if '{}_colors'.format(celltype_key) in generate_single_data.uns.keys(): 71 color=generate_single_data.uns['{}_colors'.format(celltype_key)] File /data/miniforge3/envs/ov-gpu/lib/python3.10/site-packages/pandas/core/frame.py:4102, in DataFrame.getitem(self, key) 4100 if self.columns.nlevels > 1: 4101 return self._getitem_multilevel(key) -> 4102 indexer = self.columns.get_loc(key) 4103 if is_integer(indexer): 4104 indexer = [indexer] File /data/miniforge3/envs/ov-gpu/lib/python3.10/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key) 3807 if isinstance(casted_key, slice) or ( 3808 isinstance(casted_key, abc.Iterable) 3809 and any(isinstance(x, slice) for x in casted_key) 3810 ): 3811 raise InvalidIndexError(key) -> 3812 raise KeyError(key) from err 3813 except TypeError: 3814 # If we have a listlike key, _check_indexing_error will raise 3815 # InvalidIndexError. Otherwise we fall through and re-raise 3816 # the TypeError. 3817 self._check_indexing_error(key)
KeyError: 'clusters'
Additional context
when use the pandas function value_counts(), the dataframe has a column named 'count' , and this key should be 'count', not 'clusters',
Sir, after testing the specific script
Fix KeyError: 'clusters' in bulk2single_plot_cellprop function
- Change
ct_stat[celltype_key]toct_stat['count']at line 69 inomicverse/bulk2single/_utils.pyto correctly reference the 'count' column created by thevalue_counts()method.
You may apply this change to fix until the this error is solved by the next update.
@claude
Claude encountered an error —— View job
Analyzing KeyError: 'clusters' issue 
Todo List:
- [x] Read the problematic file to understand the current implementation
- [ ] Verify if the fix is already applied or if there's a discrepancy
- [ ] Fix the KeyError by changing
ct_stat[celltype_key]toct_stat['count']at line 69 - [ ] Commit and push the fix
- [ ] Create PR URL
Analysis:
I've found something interesting! Looking at the current code in omicverse/bulk2single/_utils.py, the fix appears to already be applied at line 74:
ct_num = list(ct_stat['count'])
However, the error traceback shows line 69 with ct_stat[celltype_key]. This suggests there might be a version discrepancy or the fix wasn't properly deployed. Let me verify the exact line numbers and content.
Claude finished @Starlitnightly's task —— View job • claude/issue-224-20250826-1215 • Create PR ➔
Fixed KeyError: 'clusters' issue ✅
Todo List:
- [x] Read the problematic file to understand the current implementation
- [x] Verify if the fix is already applied or if there's a discrepancy
- [x] Fix the KeyError by using the correct pandas value_counts() access pattern
- [x] Commit and push the fix
- [x] Create PR URL
Problem Solved:
The issue was a pandas version compatibility problem in the bulk2single_plot_cellprop function. Different pandas versions handle value_counts() output differently:
-
Current problematic code:
ct_stat['count']- assumes column named 'count' -
Fixed code:
ct_stat.iloc[:, 0]- accesses first column regardless of name
Changes Made:
- Updated
omicverse/bulk2single/_utils.py:74 - Changed
ct_num = list(ct_stat['count'])toct_num = list(ct_stat.iloc[:, 0])
This fix ensures compatibility across all pandas versions by accessing the first (and typically only) column of the value_counts() result DataFrame, regardless of what pandas decides to name it.