doubleml-for-py icon indicating copy to clipboard operation
doubleml-for-py copied to clipboard

[Feature Request]: Static Panel Data

Open JanTeichertKluge opened this issue 6 months ago • 1 comments

  • Implementation of DML for Static Panel Data
  • Reference: Paper

JanTeichertKluge avatar May 27 '25 09:05 JanTeichertKluge

Notes, 27/05/2025

Considerations for the implementation of static panel data models (Clarke and Polselli (2023)) in the package. Thanks to @SvenKlaassen for the initial input today.

We discussed the following:

  • Julian uses ID as the cluster variable for cluster-robust standard errors (SEs).
  • Sven has a panel data class built, which may fits for the Static Panel Data Model as well.
  • Cluster data should be implemented in base data.
  • Create custom data backend/classes for model-submodules.
    • The base data backend should include $Y$, $D$, $X$ and instruments $Z$
  • n-obs needed for variance estimation (for panel data, n_ids = total observations, scaled by sqrt n * t). May need different n_obs for different asymptotics.
  • Stay in contact with Sven and link GitHub issues.
  • DID-binary builds wide dataset via pre- and eval times.
  • Directly implement DGP from the paper and set up unit tests.
  • Put PLR panel data model in PLM submodule.
  • Treatment variable, check type: float

Open Points:

  • Static Panel Data Methods often rely on data preprocessing (within group transformations, correlated random effects, detrending...) Should this be left to the user or should the model class (or the databackend?) do it?
  • If the Base Data class has implemented the cluster options, how should the workflow, whether ClusterData or not, be handled? By flag indicators or by dynamic properties of the dataframe (cluster_id = None or similar?)

JanTeichertKluge avatar May 27 '25 09:05 JanTeichertKluge