pudl
pudl copied to clipboard
EIA Bulk Electricity Aggregates
This draft PR has the E and T of ETL. Metadata has not been input and some tables are awkwardly placed. I have two main questions:
- where to put the association tables that define the EIA aggregates (the three global variables in the transform module)
- is the intention of this ETL to faithfully represent the EIA bulk electricity aggregates as an end unto itself, or is the intention to focus more narrowly on supporting the removal/replacement of the EIA API dependency? The latter is smaller in scope and would lead to cleaner code/integration.
The current state of this PR is a somewhat confusing mix of those two scopes. There are currently two outputs of the transform: a metadata table and a timeseries table. The metadata table provides translation between EIA's internal IDs, codes, abbreviations, and descriptions of each aggregate timeseries. The timeseries table is currently in a tidy-style format that, while convenient for analysis, obscures the connection between EIA's series IDs and the values in the dataframe. (The EIA present the cost and receipt data as separate series, each with their own 5 dimensions and 1 value. I reorganized the data as two value columns (costs and receipts) that share their 5 common indices of fuel, region, sector, frequency, and timestamp.)
If we are focused on removing the API dependency I think we can drop the metadata table entirely -- the keys are already joined into the timeseries data and we don't need to care about documenting nomenclature, abbreviations, and IDs that we won't use. If we prefer the larger scope of faithfully integrating the bulk aggregates, then we might restructure the timeseries to be more directly identifiable using EIA's "key-value pair" format.
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
On your 2nd question, for now I think that we just want the specific tidy table that we can deploy to fill in & estimate missing fuel prices in the Fuel Receipts & Costs table.
If we decide to do a more generalized transformation of all the bulk Electricity data, then I think we'll want to figure out a generalized process to deduplicate, normalize and store all of the metadata appropriately, and tie it to the data tables. But I think that's Out Of Scope™ here for sure.
Codecov Report
Base: 82.4% // Head: 83.0% // Increases project coverage by +0.5%
:tada:
Coverage data is based on head (
cc24ac5
) compared to base (64f0edd
). Patch coverage: 100.0% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## dev #1937 +/- ##
=======================================
+ Coverage 82.4% 83.0% +0.5%
=======================================
Files 64 67 +3
Lines 7092 7141 +49
=======================================
+ Hits 5849 5931 +82
+ Misses 1243 1210 -33
Impacted Files | Coverage Δ | |
---|---|---|
src/pudl/metadata/classes.py | 82.0% <ø> (ø) |
|
src/pudl/metadata/fields.py | 100.0% <ø> (ø) |
|
src/pudl/metadata/resources/ferc714.py | 100.0% <ø> (ø) |
|
src/pudl/metadata/sources.py | 100.0% <ø> (ø) |
|
src/pudl/output/pudltabl.py | 88.2% <ø> (ø) |
|
src/pudl/etl.py | 89.7% <100.0%> (+0.5%) |
:arrow_up: |
src/pudl/extract/eia_bulk_elec.py | 100.0% <100.0%> (ø) |
|
src/pudl/metadata/dfs.py | 100.0% <100.0%> (ø) |
|
src/pudl/metadata/resources/eia_bulk_elec.py | 100.0% <100.0%> (ø) |
|
src/pudl/output/eia923.py | 98.1% <100.0%> (+17.0%) |
:arrow_up: |
... and 2 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
It looks like the docs build is failing due to some formatting error in eia_bulk_elec.py