omicverse icon indicating copy to clipboard operation
omicverse copied to clipboard

KeyError: 'clusters'

Open enzeas opened this issue 1 year ago • 1 comments

Describe the bug

when follow the tutorial bulk2single, the function ov.bulk2single.bulk2single_plot_cellprop raise a KeyError: 'clusters'.

To Reproduce

command: ov.bulk2single.bulk2single_plot_cellprop(generate_adata,celltype_key='clusters')

errors: ----> 1 ov.bulk2single.bulk2single_plot_cellprop(generate_adata,celltype_key='clusters') File /data/miniforge3/envs/ov-gpu/lib/python3.10/site-packages/omicverse/bulk2single/_utils.py:69, in bulk2single_plot_cellprop(generate_single_data, celltype_key, figsize) 67 key_name=list(generate_single_data.obs[celltype_key].cat.categories) 68 ct_name = list(ct_stat.index) ---> 69 ct_num = list(ct_stat[celltype_key]) 70 if '{}_colors'.format(celltype_key) in generate_single_data.uns.keys(): 71 color=generate_single_data.uns['{}_colors'.format(celltype_key)] File /data/miniforge3/envs/ov-gpu/lib/python3.10/site-packages/pandas/core/frame.py:4102, in DataFrame.getitem(self, key) 4100 if self.columns.nlevels > 1: 4101 return self._getitem_multilevel(key) -> 4102 indexer = self.columns.get_loc(key) 4103 if is_integer(indexer): 4104 indexer = [indexer] File /data/miniforge3/envs/ov-gpu/lib/python3.10/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key) 3807 if isinstance(casted_key, slice) or ( 3808 isinstance(casted_key, abc.Iterable) 3809 and any(isinstance(x, slice) for x in casted_key) 3810 ): 3811 raise InvalidIndexError(key) -> 3812 raise KeyError(key) from err 3813 except TypeError: 3814 # If we have a listlike key, _check_indexing_error will raise 3815 # InvalidIndexError. Otherwise we fall through and re-raise 3816 # the TypeError. 3817 self._check_indexing_error(key)

KeyError: 'clusters'

Additional context when use the pandas function value_counts(), the dataframe has a column named 'count' , and this key should be 'count', not 'clusters', 微信截图_20241210143148

enzeas avatar Dec 10 '24 06:12 enzeas

Sir, after testing the specific script

Fix KeyError: 'clusters' in bulk2single_plot_cellprop function

  • Change ct_stat[celltype_key] to ct_stat['count'] at line 69 in omicverse/bulk2single/_utils.py to correctly reference the 'count' column created by the value_counts() method.

You may apply this change to fix until the this error is solved by the next update.

HendricksJudy avatar Dec 10 '24 11:12 HendricksJudy

@claude

Starlitnightly avatar Aug 26 '25 11:08 Starlitnightly

Claude encountered an error —— View job


Analyzing KeyError: 'clusters' issue

Todo List:

  • [x] Read the problematic file to understand the current implementation
  • [ ] Verify if the fix is already applied or if there's a discrepancy
  • [ ] Fix the KeyError by changing ct_stat[celltype_key] to ct_stat['count'] at line 69
  • [ ] Commit and push the fix
  • [ ] Create PR URL

Analysis:

I've found something interesting! Looking at the current code in omicverse/bulk2single/_utils.py, the fix appears to already be applied at line 74:

ct_num = list(ct_stat['count'])

However, the error traceback shows line 69 with ct_stat[celltype_key]. This suggests there might be a version discrepancy or the fix wasn't properly deployed. Let me verify the exact line numbers and content.

claude[bot] avatar Aug 26 '25 11:08 claude[bot]

Claude finished @Starlitnightly's task —— View jobclaude/issue-224-20250826-1215Create PR ➔


Fixed KeyError: 'clusters' issue ✅

Todo List:

  • [x] Read the problematic file to understand the current implementation
  • [x] Verify if the fix is already applied or if there's a discrepancy
  • [x] Fix the KeyError by using the correct pandas value_counts() access pattern
  • [x] Commit and push the fix
  • [x] Create PR URL

Problem Solved:

The issue was a pandas version compatibility problem in the bulk2single_plot_cellprop function. Different pandas versions handle value_counts() output differently:

  • Current problematic code: ct_stat['count'] - assumes column named 'count'
  • Fixed code: ct_stat.iloc[:, 0] - accesses first column regardless of name

Changes Made:

  • Updated omicverse/bulk2single/_utils.py:74
  • Changed ct_num = list(ct_stat['count']) to ct_num = list(ct_stat.iloc[:, 0])

This fix ensures compatibility across all pandas versions by accessing the first (and typically only) column of the value_counts() result DataFrame, regardless of what pandas decides to name it.


claude[bot] avatar Aug 26 '25 12:08 claude[bot]