differences icon indicating copy to clipboard operation
differences copied to clipboard

Error while calculating standard errors

Open achinmay17 opened this issue 1 year ago • 3 comments

Hi, I am trying to run Doubly Robust S-DID with unbalanced panel and varying base period. the control group is 'not_yet_treated' My code is as following:

    att_gt = ATTgt(data=diddata, cohort_name="course_month_end_date", base_period='varying', freq='M') 
    att_gt.fit(formula = formula, est_method='dr',control_group=control_group, progress_bar = True) 

however, I am getting following error which I am not able to understand

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[241], line 7
      4 diddata = diddata.reset_index().set_index(keys=['id','month_end_date'])
      6 att_gt = ATTgt(data=diddata, cohort_name="course_month_end_date", base_period='varying', freq='M')
----> 7 att_gt.fit(formula = formula, est_method='dr',control_group=control_group, progress_bar = False)

File ~/anaconda3/lib/python3.11/site-packages/differences/attgt/attgt.py:718, in ATTgt.fit(self, formula, weights_name, control_group, base_delta, est_method, as_repeated_cross_section, boot_iterations, random_state, alpha, cluster_var, split_sample_by, n_jobs, backend, progress_bar)
    688     res = get_att_gt(
    689         data=(
    690             self._data_matrix
   (...)
    714         ),
    715     )
    717     # standard errors & ci/cbands
--> 718     res = get_standard_errors(
    719         ntl=res,
    720         cluster_groups=cluster_groups,
    721         alpha=alpha,
    722         boot_iterations=boot_iterations,
    723         random_state=random_state,
    724         n_jobs_boot=n_jobs,
    725         backend_boot=backend,
    726         progress_bar=progress_bar,
    727         sample_name=s if s != "full_sample" else None,
    728         release_workers=s_idx == n_sample_names,
    729     )
    731     self._result_dict[s]["ATTgt_ntl"] = res
    733 self._fit_res = output_dict_to_dataframe(
    734     extract_dict_ntl(self._result_dict),
    735     stratum=bool(self._strata),
    736     date_map=self._map_datetime,
    737 )

File ~/anaconda3/lib/python3.11/site-packages/differences/attgt/attgt_cal.py:442, in get_standard_errors(ntl, cluster_groups, alpha, boot_iterations, random_state, backend_boot, n_jobs_boot, progress_bar, sample_name, release_workers)
    436     raise ValueError(
    437         "'boot_iterations' must be >= 0. "
    438         "If boot_iterations=0, analytic standard errors are computed"
    439     )
    441 # influence funcs + idx for not nan cols
--> 442 inf_funcs, not_nan_idx = stack_influence_funcs(ntl, return_idx=True)
    444 # create an empty array to populate with the standard errors
    445 se_array = np.empty(len(ntl))

File ~/anaconda3/lib/python3.11/site-packages/differences/attgt/utility.py:382, in stack_influence_funcs(ntl, return_idx)
    380     inf_funcs = inf_funcs.toarray()  # faster mboot if dense matrix
    381 else:
--> 382     inf_funcs = np.stack(
    383         [r.influence_func for r in ntl if r.influence_func is not None], axis=1
    384     )
    386 if return_idx:
    387     # indexes for the non-missing influence_func
    388     not_nan_idx = np.array(
    389         [i for i, r in enumerate(ntl) if r.influence_func is not None]
    390     )

File <__array_function__ internals>:200, in stack(*args, **kwargs)

File ~/anaconda3/lib/python3.11/site-packages/numpy/core/shape_base.py:460, in stack(arrays, axis, out, dtype, casting)
    458 arrays = [asanyarray(arr) for arr in arrays]
    459 if not arrays:
--> 460     raise ValueError('need at least one array to stack')
    462 shapes = {arr.shape for arr in arrays}
    463 if len(shapes) != 1:

ValueError: need at least one array to stack

Based on what I could understand from the package, it is not able to calculate standard errors. It would be great if you can help with debugging. Thanks.

achinmay17 avatar Dec 20 '23 03:12 achinmay17

Are all your cohorts very small? How unbalanced is the data? Would you be able to share some data to reproduce this error? A simulated dataset that contains the same entity-time structure and cohort composition should do. Thanks!

bernardodionisi avatar Dec 20 '23 15:12 bernardodionisi

@bernardodionisi

Are all your cohorts very small? How unbalanced is the data? Would you be able to share some data to reproduce this error? A simulated dataset that contains the same entity-time structure and cohort composition should do. Thanks!

I have the same problem (ValueError: need at least one array to stack ) and the data is very unbalanced and cohorts are small. Do you have a solution?

jonahnieuwenhuijzen avatar May 30 '24 11:05 jonahnieuwenhuijzen

Hi @jonahnieuwenhuijzen

have you tried different estimation methods? using the est_method parameter? The default is dr-mle, you may try dr-ipt which changes how the propensity scores are calculated. But could you try to experiment with other methods? Let me know if it helps.

bernardodionisi avatar May 30 '24 14:05 bernardodionisi