pudl copied to clipboard
Create infrastructure for publishing all raw FERC SQLite DB's extracted from XBRL data
Our FERC XBRL Extractor works with FERC Froms 1, 2, 6, 60, and 714. Form 1 is the most well integrated with PUDL, but even then many tables have not yet been integrated into the ETL. So, the ability to publish the raw SQLite DB's will be very useful. The XBRL data is also much better structured than the historical data, but very hard to work with in the XBRL format, so the raw SQLite versions of all of these forms could provide a lot of value.
Ingest Metadata generated by extraction tool
- [ ] The FERC XBRL Extractor can generate a Frctionless Data Package using metadata extracted from the FERC taxonomy. This will us to publish each database with column level descriptions provided by FERC
Enable publication
- [ ] Integrate new sources with
- [ ] Update datasette publication bash script
It seems like something we might as well do. The DBF data is going to be messier and we won't be able to provide the same level of documentation, but it would still be more accessible than in the DBF format.
I suppose each Form would have to have 2 databases. How big of a lift would this be? @zaneselvans, which forms here are most valuable?
The only forms we've said we would integrate historical data for are 1, 2, and 714
The Form 2 is analogous to Form 1 but for interstate natural gas utilities, so mostly transmission pipeline companies. We'd hoped there would be more state level has utilities in there as there are for electric utilities, but it seems like that's not the case.
The old Form 714 is partially integrated, and provides a bunch of data about balancing and planning areas, including hourly demand. The old data is bunch of CSVs dumped from DBF, all years in one partition.
So I think those are the highest priority, and the old 714 data will be easier to work with.
IIRC, Form 6 is like Forms 1 & 2, but for petroleum, and the old data is DBF. I think that would be the next priority. Form 60 seemed like a mysterious "other entities" category, and would be the lowest priority.
I imagine having the XBRL databases will make it easier to interpret the old data.
PUDL is now able to construct SQLite DB's from all FERC XBRL forms, and ingest/convert the accompanying datapackage descriptors.We have not yet published these DB's on datasette, but the infrastructure is all in place on the xbrl_integration