pudl icon indicating copy to clipboard operation
pudl copied to clipboard

Create EIA API v2 fuel price archiving script

Open zaneselvans opened this issue 2 years ago • 1 comments

The EIA API contains data that's not directly available from the spreadsheets they publish, including aggregate fuel prices which include redacted fuel deliveries to IPPs/merchant generators. The API itself isn't completely reliable, and we really shouldn't have to download the same data over and over again.

To provide reliable access to this information within PUDL:

  • [ ] Create a python script which will download the entire history of aggregate fuel prices ($/mmbtu) and quantities delivered (mmbtu), including, broken down by time step, geographic area, fuel type, and industrial sector. This script can be modeled after the EPA CEMS scraper, which works outside of Scrapy (since it's pulling from the API directly). Store the archived data in as close to its original form as possible, probably a single zipped JSON file. Ideally this script should be written such that we can easily add other data series to archive from the EIA API if we need them in the future as well.
  • [ ] Integrate this script into the pudl-scrapers repo alongside the other non-scrapy script that we use to download the EPA CEMS data from their janky FTP server.

zaneselvans avatar Jul 18 '22 22:07 zaneselvans

There are multiple possible sources of this data: the EIA API, the EIA API bulk data downloader, and the EIA Electric Power Monthly (EPM). I already compared the API vs EPM in #1712

After looking into the API and bulk downloader, there are differences there as well:

  • The API has many more fuel categories (45) than the bulk download (7). But many of the extra categories are irrelevant (renewables, nuclear, hydro, etc) and I'm not sure how many have actual data.
  • The API has one additional sector: Electric Power Sector Non-CHP
  • The GUI API explorer doesn't let you select quarterly resolution for fuel receipts and costs, but it is present in both the bulk data and the actual API.

I assume most of the extra fuel categories are mostly nulls, but I'll check. Under that assumption, I think the operational advantages of the bulk download outweigh the few extra categories from the API.

TrentonBush avatar Jul 20 '22 22:07 TrentonBush

@TrentonBush Is this issue done?

zaneselvans avatar Aug 30 '22 01:08 zaneselvans