pudl
pudl copied to clipboard
Integrate respondent frequency in EIA-923 tables
Certain plants only report data to the generator table and boiler fuel table on an annual basis. In this case, their annual total is reported as a single value in December, and the other 11 months are reported as missing values. Previously, this annually-reported data was only being used to allocate generation and fuel data in December, instead of for the entire year. The raw EIA-923 table includes a column for 'Respondent Frequency' (which includes a code of 'A' for annual and 'M' for monthly). In theory this column could be used to identify these plants, but but this column is not currently included in pudl.
Whaaat? Well we should add that. Where does it show up in the spreadsheets? What entity is it associated with (generator, plant, utility?)
That's wild that there's some December-only records for the whole year. How annoying. I wonder what all other tables have this dynamic going on.
Looks like it shows up in at least:
-
boiler_fuel_eia923
(Seems like Annual records only have data in December) -
generation_eia923
(Seems like Annual records only have data in December) -
generation_fuel_eia923
(Many non-December records show up with data in them, but a lot are zero) -
fuel_receipts_costs_eia923
(Data shows up in all months for records designated Annual)
Is a substantial fraction of overall fuel / generation being reported annually?
To deal with this correctly it seems like we'd really need to break out the annual and monthly reporting into separate tables that have different temporal resolution.
Originally posted by @zaneselvans in https://github.com/catalyst-cooperative/pudl/issues/1608#issuecomment-1183913344
Tasks (to turn into issues...)
- [ ] Update spreadsheet extraction metadata to retain Respondent Frequency in all sheets where it exists across all EIA years.
- [ ] Identify what entity the Respondent Frequency should be associated with (plant? generator? utility?)
- [ ] Update EIA transform + harvesting functions to add Respondent Frequency to the appropriate entity table(s).
- [ ] Identify which tables, and which types of records in those tables, need to be processed differently than they are now, given the different reporting frequencies.
- [ ] Figure out how to integrate the different frequencies of data appropriately in the normalized database. E.g.
- Create separate monthly and annual resolution data tables containing the M+MA and A records respectively.
- Allocate the single annually reported value across all of the months.
- Remove the Jan-Nov records for Annually reported data rather than having NA values in those months.
- [ ] Figure out how to integrate both monthly and annually reported data into analyses which depend on this data. E.g.
- Create a derived data product with estimated monthly values for all plants, based on actually reported monthly data and the annually reported data allocated to all the months using an imputation method that learns from the patterns of other plants which do report monthly data to determine the monthly allocations.
According the the EIA-923 file layout sheet, the different types of respondent frequencies are:
Looking at the 2020 EIA-923 data:
- 181/9319 records report annually in the boiler fuel table
- 139/3844 records report annually in the generation table
Generally what this looks like is: