pudl
pudl copied to clipboard
Create EIA API v2 fuel price archiving script
The EIA API contains data that's not directly available from the spreadsheets they publish, including aggregate fuel prices which include redacted fuel deliveries to IPPs/merchant generators. The API itself isn't completely reliable, and we really shouldn't have to download the same data over and over again.
To provide reliable access to this information within PUDL:
- [ ] Create a python script which will download the entire history of aggregate fuel prices ($/mmbtu) and quantities delivered (mmbtu), including, broken down by time step, geographic area, fuel type, and industrial sector. This script can be modeled after the EPA CEMS scraper, which works outside of Scrapy (since it's pulling from the API directly). Store the archived data in as close to its original form as possible, probably a single zipped JSON file. Ideally this script should be written such that we can easily add other data series to archive from the EIA API if we need them in the future as well.
- [ ] Integrate this script into the
pudl-scrapers
repo alongside the other non-scrapy script that we use to download the EPA CEMS data from their janky FTP server.
There are multiple possible sources of this data: the EIA API, the EIA API bulk data downloader, and the EIA Electric Power Monthly (EPM). I already compared the API vs EPM in #1712
After looking into the API and bulk downloader, there are differences there as well:
- The API has many more fuel categories (45) than the bulk download (7). But many of the extra categories are irrelevant (renewables, nuclear, hydro, etc) and I'm not sure how many have actual data.
- The API has one additional sector: Electric Power Sector Non-CHP
- The GUI API explorer doesn't let you select quarterly resolution for fuel receipts and costs, but it is present in both the bulk data and the actual API.
I assume most of the extra fuel categories are mostly nulls, but I'll check. Under that assumption, I think the operational advantages of the bulk download outweigh the few extra categories from the API.
@TrentonBush Is this issue done?