miceforest icon indicating copy to clipboard operation
miceforest copied to clipboard

[Maybe Bug] IndexError: positional indexers are out-of-bounds

Open shaojunjie0912 opened this issue 11 months ago • 1 comments

Hi, thanks for your contribution firstly.

I found this error in my code below. It couldn't finish the imputation.

I'm curious if this has something to do with the amount of data being too small.

if __name__ == "__main__":
    data = pd.DataFrame({"col1": [100, None, 200, None, 250, None, None, 200], "col2": [1, None, 3, 4, None, 6, 7, 8]})

    kernel = mf.ImputationKernel(data=data)
    kernel.mice(iterations=3)
    data_imputed = kernel.complete_data()
    print(data_imputed)

Here is my error:

Traceback (most recent call last):
  File "C:\Users\janze\scoop\apps\miniconda3\24.9.2-0\envs\data_cleaning\Lib\site-packages\pandas\core\indexing.py", line 1714, in _get_list_axis
    return self.obj._take_with_is_copy(key, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\janze\scoop\apps\miniconda3\24.9.2-0\envs\data_cleaning\Lib\site-packages\pandas\core\generic.py", line 4153, in _take_with_is_copy
    result = self.take(indices=indices, axis=axis)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\janze\scoop\apps\miniconda3\24.9.2-0\envs\data_cleaning\Lib\site-packages\pandas\core\generic.py", line 4133, in take
    new_data = self._mgr.take(
               ^^^^^^^^^^^^^^^
  File "C:\Users\janze\scoop\apps\miniconda3\24.9.2-0\envs\data_cleaning\Lib\site-packages\pandas\core\internals\managers.py", line 891, in take
    indexer = maybe_convert_indices(indexer, n, verify=verify)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\janze\scoop\apps\miniconda3\24.9.2-0\envs\data_cleaning\Lib\site-packages\pandas\core\indexers\utils.py", line 282, in maybe_convert_indices
    raise IndexError("indices are out-of-bounds")
IndexError: indices are out-of-bounds

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\DATA\CodeField\DataCleaning\tests\misc\test_mice.py", line 17, in <module>
    kernel.mice(iterations=3)
  File "C:\Users\janze\scoop\apps\miniconda3\24.9.2-0\envs\data_cleaning\Lib\site-packages\miceforest\imputation_kernel.py", line 1186, in mice
    imputation_values = self._mean_match_mice(
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\janze\scoop\apps\miniconda3\24.9.2-0\envs\data_cleaning\Lib\site-packages\miceforest\imputation_kernel.py", line 971, in _mean_match_mice
    imputation_values = self._mean_match_nearest_neighbors(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\janze\scoop\apps\miniconda3\24.9.2-0\envs\data_cleaning\Lib\site-packages\miceforest\imputation_kernel.py", line 621, in _mean_match_nearest_neighbors
    imp_values = candidate_values.iloc[index_choice]
                 ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "C:\Users\janze\scoop\apps\miniconda3\24.9.2-0\envs\data_cleaning\Lib\site-packages\pandas\core\indexing.py", line 1191, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\janze\scoop\apps\miniconda3\24.9.2-0\envs\data_cleaning\Lib\site-packages\pandas\core\indexing.py", line 1743, in _getitem_axis
    return self._get_list_axis(key, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\janze\scoop\apps\miniconda3\24.9.2-0\envs\data_cleaning\Lib\site-packages\pandas\core\indexing.py", line 1717, in _get_list_axis
    raise IndexError("positional indexers are out-of-bounds") from err
IndexError: positional indexers are out-of-bounds

Here is my package version:

numpy           2.2.1
miceforest      6.0.3
pandas          2.2.3

Thank you!

shaojunjie0912 avatar Dec 24 '24 13:12 shaojunjie0912

Sorry to chime in but I would assume this comes from scipy's kdtree query method Namely, it says:

  • i : integer or array of integers The index of each neighbor in self.data. i is the same shape as d. Missing neighbors are indicated with self.n.

I'm just adding this to the code: index_choice = np.clip(index_choice, 0, candidate_values.shape[0] - 1) Not a good solution but a quick one haha

gencgeci avatar Mar 09 '25 23:03 gencgeci

Can you guys post a reproducible example - not sure why this would happen.

AnotherSamWilson avatar Oct 22 '25 13:10 AnotherSamWilson