differences
differences copied to clipboard
Error right on the start
I'm trying to reproduce the very first example of use:
panel_data = simulate_data() att_gt = ATTgt(data=panel_data, cohort_name='cohort')
But I'm getting the folloing error. Any idea what's the problem?
KeyError Traceback (most recent call last) File C:\Programas\anaconda3\envs\python39\lib\site-packages\pandas\core\indexes\base.py:3653, in Index.get_loc(self, key) 3652 try: -> 3653 return self._engine.get_loc(casted_key) 3654 except KeyError as err:
File C:\Programas\anaconda3\envs\python39\lib\site-packages\pandas_libs\index.pyx:147, in pandas._libs.index.IndexEngine.get_loc()
File C:\Programas\anaconda3\envs\python39\lib\site-packages\pandas_libs\index.pyx:176, in pandas._libs.index.IndexEngine.get_loc()
File pandas_libs\hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas_libs\hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'amin'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last) Cell In[17], line 1 ----> 1 att_gt = ATTgt(data=panel_data, cohort_name='cohort')
File C:\Programas\anaconda3\envs\python39\lib\site-packages\differences\attgt\attgt.py:114, in ATTgt.init(self, data, cohort_name, strata_name, base_period, anticipation, freq) 111 self.anticipation = anticipation 112 self.copy_data = True # maybe make an option to the user --> 114 self.data = data 116 self._data_matrix = None 118 self._result_dict = None
File C:\Programas\anaconda3\envs\python39\lib\site-packages\differences\tools\panel_validation.py:149, in _ValiDIData.set(self, instance, data) 147 # todo: look into non ATTgt classes how to handle this cases 148 if instance.is_panel and anticipation is not None: --> 149 cohort_data, data = pre_process_treated_before( 150 cohort_data=cohort_data, 151 cohort_name=cohort_name, 152 data=data, 153 copy_data=copy_data, 154 ) 156 cohort_data = pre_process_treated_after( 157 cohort_data=cohort_data, 158 cohort_name=cohort_name, 159 anticipation=anticipation, # no effect for now 160 ) 162 no_never_treated_flag, cohort_data, data = pre_process_no_never_treated( 163 cohort_data=cohort_data, 164 cohort_name=cohort_name, (...) 168 copy_data=copy_data, 169 )
File C:\Programas\anaconda3\envs\python39\lib\site-packages\differences\tools\panel_validation.py:332, in pre_process_treated_before(cohort_data, cohort_name, data, copy_data) 329 """drops always treated entities""" 331 # entities whose event happened BEFORE the start of their time --> 332 treated_before = cohort_data.loc[ 333 lambda x: x[cohort_name] <= x["amin"] 334 ].index.unique() 336 if len(treated_before): 337 warn( 338 f"{len(treated_before)} entities have been " 339 f"dropped because always treated " 340 f"(treated from before their first time)" 341 )
File C:\Programas\anaconda3\envs\python39\lib\site-packages\pandas\core\indexing.py:1102, in _LocationIndexer.getitem(self, key) 1098 else: 1099 # we by definition only have the 0th axis 1100 axis = self.axis or 0 -> 1102 maybe_callable = com.apply_if_callable(key, self.obj) 1103 return self._getitem_axis(maybe_callable, axis=axis)
File C:\Programas\anaconda3\envs\python39\lib\site-packages\pandas\core\common.py:379, in apply_if_callable(maybe_callable, obj, **kwargs) 368 """ 369 Evaluate possibly callable input using obj and kwargs if it is callable, 370 otherwise return as it is. (...) 376 **kwargs 377 """ 378 if callable(maybe_callable): --> 379 return maybe_callable(obj, **kwargs) 381 return maybe_callable
File C:\Programas\anaconda3\envs\python39\lib\site-packages\differences\tools\panel_validation.py:333, in pre_process_treated_before.
File C:\Programas\anaconda3\envs\python39\lib\site-packages\pandas\core\frame.py:3761, in DataFrame.getitem(self, key) 3759 if self.columns.nlevels > 1: 3760 return self._getitem_multilevel(key) -> 3761 indexer = self.columns.get_loc(key) 3762 if is_integer(indexer): 3763 indexer = [indexer]
File C:\Programas\anaconda3\envs\python39\lib\site-packages\pandas\core\indexes\base.py:3655, in Index.get_loc(self, key) 3653 return self._engine.get_loc(casted_key) 3654 except KeyError as err: -> 3655 raise KeyError(key) from err 3656 except TypeError: 3657 # If we have a listlike key, _check_indexing_error will raise 3658 # InvalidIndexError. Otherwise we fall through and re-raise 3659 # the TypeError. 3660 self._check_indexing_error(key)
KeyError: 'amin'
Hi Marcus,
thanks for reporting this! I think I know why this is happening. There was a change, I think in pandas, in which a groupby operation that used to return a column named 'amin', now returns a column named 'min'.
I just tried the following code on a fresh Google Colab session in which there is pandas 1.5.3 installed and it worked
!pip install differences
from differences import ATTgt, simulate_data
panel_data = simulate_data()
att_gt = ATTgt(data=panel_data, cohort_name='cohort')
att_gt.fit("y")
I will try to fix this asap since the problem breaks the code, I am afraid I won't be able to until the weekend but I should get to this then. In the meantime maybe try downgrading pandas, or maybe create a virtual environment with the specs that currently are set in google colab.
Thank you!
Is this issue fixed? I'm also experiencing the same issue even set pandas to 1.5.3.
should be fixed. thanks for the report and thanks @schrimpf for setting things in motion. Just released v0.2 with the patch for this issue