open-grid-emissions
open-grid-emissions copied to clipboard
Handling missing data in EIA-930
How is missing data handled in the EIA-930 data cleaning/reconciliation process?
Whenever there is missing data in EIA-930, it appears that a value of 1.0 is getting assigned to those hours. We may want to preserve NA values instead, but I'm not sure if this would mess up the physics reconciliation process? Could 0 be used instead of 1.0?
Examining the EIA-930 data after it has gone through the reconciliation process, of the ~5.2 million rows of data in eia930_data
, approximately 3.2 million rows are equal to 1.0 +/- 0.001. There is also no data equal to zero in the entire dataframe. It seems like the eia930 data cleaning process is treating reported zeros, negative values, and missing data the same, and assigning them all a value of 1.
I'm not sure whether this data cleaning step is a requirement of the physics based optimization (can the optimization not handle negative or zero values?), or whether this could be changed.
Ideally, we'd like to preserve negative, missing, and zero values in their original form. Where there are missing values, or anomalous values, it might make sense to implement some sort of imputation step, but it seems like there should be some better method than just assigning it a value of 1.
Most of the zeros (~5,000,000) are added in the physics-based cleaning step, mostly in fuel-specific generation columns not present in the original EIA-930 data (eg, NUC generation for a small BA with no nuclear power plant).
About 2000 1.0 values are added in each of the basic and rolling cleaning steps.
Addressing this issue should include:
- Physics-based reconciliation should not require generation from fuels not found in a BA
- Zero values should be allowed
As a next step, should we post this as an issue/question on the gridemissions
repo?
https://github.com/jdechalendar/gridemissions/issues/8
It seems that based on this comment, this method was implemented in gridemissions
out of convenience rather than necessity, so it should be possible to update that repo as part of v2.