open-grid-emissions icon indicating copy to clipboard operation
open-grid-emissions copied to clipboard

Handling missing data in EIA-930

Open grgmiller opened this issue 2 years ago • 5 comments

How is missing data handled in the EIA-930 data cleaning/reconciliation process?

Whenever there is missing data in EIA-930, it appears that a value of 1.0 is getting assigned to those hours. We may want to preserve NA values instead, but I'm not sure if this would mess up the physics reconciliation process? Could 0 be used instead of 1.0?

grgmiller avatar Jun 18 '22 23:06 grgmiller

Examining the EIA-930 data after it has gone through the reconciliation process, of the ~5.2 million rows of data in eia930_data , approximately 3.2 million rows are equal to 1.0 +/- 0.001. There is also no data equal to zero in the entire dataframe. It seems like the eia930 data cleaning process is treating reported zeros, negative values, and missing data the same, and assigning them all a value of 1.

I'm not sure whether this data cleaning step is a requirement of the physics based optimization (can the optimization not handle negative or zero values?), or whether this could be changed.

Ideally, we'd like to preserve negative, missing, and zero values in their original form. Where there are missing values, or anomalous values, it might make sense to implement some sort of imputation step, but it seems like there should be some better method than just assigning it a value of 1.

grgmiller avatar Jun 28 '22 16:06 grgmiller

Most of the zeros (~5,000,000) are added in the physics-based cleaning step, mostly in fuel-specific generation columns not present in the original EIA-930 data (eg, NUC generation for a small BA with no nuclear power plant).

About 2000 1.0 values are added in each of the basic and rolling cleaning steps.

Addressing this issue should include:

  • Physics-based reconciliation should not require generation from fuels not found in a BA
  • Zero values should be allowed

gailin-p avatar Jun 29 '22 14:06 gailin-p

As a next step, should we post this as an issue/question on the gridemissions repo?

grgmiller avatar Jun 29 '22 15:06 grgmiller

https://github.com/jdechalendar/gridemissions/issues/8

gailin-p avatar Jun 29 '22 15:06 gailin-p

It seems that based on this comment, this method was implemented in gridemissions out of convenience rather than necessity, so it should be possible to update that repo as part of v2.

grgmiller avatar Aug 09 '22 16:08 grgmiller