pudl
pudl copied to clipboard
Adding remaining tabs from 860 EnviroAssoc to PUDL
Initial work on incorporating the remaining tabs of EIA 860 6_1 EnviroAssoc into PUDL. The code as well as this note is a work in progress...
Extraction
The extraction step should be working for each of the following sheets.
- There are a set of sheets that associate a particular kind of emission control device ID to the relevant plant and boiler
- Cooling (2009-)
- Particulate Matter (2013-)
- NOx (2013-)
- SO2 (2013-)
- Mercury (2013-)
- Stack / Flue (2009-)
- FGD (2009-2012)
- FGP (2009-2012)
- An Emission Control Equipment sheet (2013-) that provides type, status, cost, install and retirement dates, and some other info for Particulate Matter, NOx, SO2 and Mercury control equipment installed at each plant.
Transformation
For our current use case, what we really want to do is combine all the sheets listed above together so that you can see what emission control equipment is associated with a given boiler. A not very good first attempt for this is the emission_control_equip
transformation function.
Hey @cmgosnell do we have a plan for getting this PR integrated? Would be a great complement to the complete and up-to-date mapping of the early release data and all the previously missing columns.
@arengel are there concrete remaining tasks that we you y'all can take on here?
What's the current status of this PR? I may have some time to work on https://github.com/catalyst-cooperative/pudl/issues/1162 over the next month but want to see where I could be the most helpful.
What's the current status of this PR? I may have some time to work on #1162 over the next month but want to see where I could be the most helpful.
As of now this really just has the column and file mappings to enable extraction of the EnviroAssoc and EnviroEquip 860 files. There is also a WIP transformation function for combining the EnviroAssoc tables, as well as some futzing around with metadata that is probably wrong.
To get this data into a form that would be easily usable requires a transformation that links the association data to the equipment data and deals with how the reporting changed over time (columns moving between tables, etc). Unfortunately, no one from our team has capacity in the near future to work on this. It's also the sort of thing where the comprehensive solution is not a high priority for the Hub.
As for what to do with it now, if you are picking up #1162, this PR gets you a good chunk of the way through the extraction step. Otherwise, is it acceptable to merge in just the package data so that we can run the extraction, and then down the road someone can pick back up the rest of the ETL?
I'm picking up on the integration process here. I've migrated the changes from this PR into a new branch / draft PR, which can be found here.