Expanded datasets with multiple imputations per record

Open MaxGhenis opened this issue 7 years ago • 0 comments

taxdata datasets currently assign a single realization of imputation models per record. This keeps the datasets small, but discard information held in the stochastic imputation models.

In addition to the base datasets, analysts could expect more robust analysis if using data that produces multiple imputations per record. For example, a dataset with 10 imputations per record would then have 10 records per current record, each with s006 divided by 10. Since many imputed variables are often zero, this would produce a dataset that would be larger, but not necessarily 10x larger (if all imputations are equal, records can be consolidated). One could imagine a workflow where analysts do trial runs of their analysis using the current datasets for speed, and then the expanded dataset for a final, slower analysis.

As more variables are imputed, like #221, the value of these extra imputations will increase.

Jun 15 '18 15:06 MaxGhenis