pudl icon indicating copy to clipboard operation
pudl copied to clipboard

EIA Bulk Electricity Aggregates

Open TrentonBush opened this issue 1 year ago • 3 comments

This draft PR has the E and T of ETL. Metadata has not been input and some tables are awkwardly placed. I have two main questions:

  1. where to put the association tables that define the EIA aggregates (the three global variables in the transform module)
  2. is the intention of this ETL to faithfully represent the EIA bulk electricity aggregates as an end unto itself, or is the intention to focus more narrowly on supporting the removal/replacement of the EIA API dependency? The latter is smaller in scope and would lead to cleaner code/integration.

The current state of this PR is a somewhat confusing mix of those two scopes. There are currently two outputs of the transform: a metadata table and a timeseries table. The metadata table provides translation between EIA's internal IDs, codes, abbreviations, and descriptions of each aggregate timeseries. The timeseries table is currently in a tidy-style format that, while convenient for analysis, obscures the connection between EIA's series IDs and the values in the dataframe. (The EIA present the cost and receipt data as separate series, each with their own 5 dimensions and 1 value. I reorganized the data as two value columns (costs and receipts) that share their 5 common indices of fuel, region, sector, frequency, and timestamp.)

If we are focused on removing the API dependency I think we can drop the metadata table entirely -- the keys are already joined into the timeseries data and we don't need to care about documenting nomenclature, abbreviations, and IDs that we won't use. If we prefer the larger scope of faithfully integrating the bulk aggregates, then we might restructure the timeseries to be more directly identifiable using EIA's "key-value pair" format.

TrentonBush avatar Sep 16 '22 08:09 TrentonBush

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

On your 2nd question, for now I think that we just want the specific tidy table that we can deploy to fill in & estimate missing fuel prices in the Fuel Receipts & Costs table.

If we decide to do a more generalized transformation of all the bulk Electricity data, then I think we'll want to figure out a generalized process to deduplicate, normalize and store all of the metadata appropriately, and tie it to the data tables. But I think that's Out Of Scope™ here for sure.

zaneselvans avatar Sep 16 '22 18:09 zaneselvans

Codecov Report

Base: 82.4% // Head: 83.0% // Increases project coverage by +0.5% :tada:

Coverage data is based on head (cc24ac5) compared to base (64f0edd). Patch coverage: 100.0% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff           @@
##             dev   #1937     +/-   ##
=======================================
+ Coverage   82.4%   83.0%   +0.5%     
=======================================
  Files         64      67      +3     
  Lines       7092    7141     +49     
=======================================
+ Hits        5849    5931     +82     
+ Misses      1243    1210     -33     
Impacted Files Coverage Δ
src/pudl/metadata/classes.py 82.0% <ø> (ø)
src/pudl/metadata/fields.py 100.0% <ø> (ø)
src/pudl/metadata/resources/ferc714.py 100.0% <ø> (ø)
src/pudl/metadata/sources.py 100.0% <ø> (ø)
src/pudl/output/pudltabl.py 88.2% <ø> (ø)
src/pudl/etl.py 89.7% <100.0%> (+0.5%) :arrow_up:
src/pudl/extract/eia_bulk_elec.py 100.0% <100.0%> (ø)
src/pudl/metadata/dfs.py 100.0% <100.0%> (ø)
src/pudl/metadata/resources/eia_bulk_elec.py 100.0% <100.0%> (ø)
src/pudl/output/eia923.py 98.1% <100.0%> (+17.0%) :arrow_up:
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Sep 17 '22 22:09 codecov[bot]

It looks like the docs build is failing due to some formatting error in eia_bulk_elec.py

zaneselvans avatar Sep 29 '22 14:09 zaneselvans