pudl
pudl copied to clipboard
Integrate the new FERC 1 XBRL archive into the PUDL Datastore.
Background
In order to ingest XBRL data into PUDL, we need a datastore that can interpret XBRL archives (#1593). The archives consist of a set of XBRL filings, and some metadata pulled from the RSS feed, and stored in a JSON file. The metadata provides a list of filings (with additional info like the date-time the filings was submitted) submitted by an individual filer for a specified year and period. This is required because filers are able to resubmit filings at any point in time, so there may be multiple filings for filer for a specific year/period, and PUDL must know which filing to use.
Design
The datastore will open the metadata file, and find the most recent filing for every filer/year/period combo. We will assume that the most recent filing is the best one to process. It will then read this files into in-memory buffers which will be passed to the XBRL extractor.
is this finished? or is it finished enough in the xbrl_integration
branch
This is so finished.