pypsa-usa icon indicating copy to clipboard operation
pypsa-usa copied to clipboard

Refactor `build_demand.py` to use EIA930 data from PUDL

Open jpvelez opened this issue 1 year ago • 3 comments

Feature Request

Catalyst Coop is currently in the midst of adding EIA930 data to PUDL.

They just produced their first nightly build that includes (relatively unprocessed) hourly EIA-930 balancing authority operations data, including the following 4 tables:

  • core_eia930__hourly_interchange
  • core_eia930__hourly_net_generation_by_energy_source
  • core_eia930__hourly_operations
  • core_eia930__hourly_subregion_demand

Docs are here, notebook poking at the data here.

Once the tables graduate to out_eia930_*, meaning they have been fully cleaned up and are production ready, we'll need to refactor build_demand to pull EIA930 data from PUDL.

Suggested Solution

  • [ ] Identify which PUDL tables / columns we need
  • [ ] Check in with Zane from Catalyst about when out versions of the data will be out
  • [ ] Refactor workflow/scripts/building_demand.py to query tables for data
  • [ ] Update Snakefile to make build_demand task depend on forthcoming retrieve_pudl task (#311)

jpvelez avatar May 03 '24 22:05 jpvelez

So one way to handle this to reduce duplicate work from this ticket and #314 would be to write a _helper.py script function that can read the PUDL sql db file to extract the EIA930 into a pd.Dataframe with the current format we currently import into both plot_validation_figures and in build_demand.

plot_validation_figures queries the generation & imports/exports data columns of the 930 whereas build_demand queries the demand only.

ktehranchi avatar May 03 '24 23:05 ktehranchi

Our first round of work with GridLab is pretty much done now, and Ana and Elaine wanted the relatively unprocessed EIA-930, so I don't have a timeline for when a more cleaned up set of EIA-930 output tables will be available. Though if someone wanted to adapt one of the existing modules from Jacques at Stanford or the Open Grid Emissions project into a more processed EIA-930 table in PUDL, that would be useful. There's a lot of stuff going on in the EIA-930 that makes it complicated to use as-is depending on what you need. We could also try depending directly on https://github.com/jdechalendar/gridemissions/

Recently we got to the point where the number of multi-million row hourly tables was just too much for SQLite to be convenient, so now all the hourly outputs are only in Parquet. The best place to pull bulk PUDL outputs from is the PUDL AWS Open Data Registry S3 bucket: s3://pudl.catalyst.coop/ We haven't done a release with the new data from the GridLab project yet so you'll only find them in /nightly for now.

zaneselvans avatar May 04 '24 02:05 zaneselvans

Good to know- I think for our purposes it is useful to have Jacques (GridEmissions) physics-based reconciliation of EIA930 data, just to make sure some basics (like supply=demand) hold true in the data.

We are already pulling that data directly from his was s3 bucket. Unless @jpvelez you wanted to pull work from jacques's GridEmissions tool into PUDL... i think we can table this ticket for now.

ktehranchi avatar May 06 '24 17:05 ktehranchi