open-grid-emissions
open-grid-emissions copied to clipboard
Ensure complete `subplant_id` mapping
Currently, subplant IDs are only created for units that exist both in CEMS and EIA-923, meaning that there are certain generators/units that have a subplant ID of NaN
.
- [x] Ensure that all merge and groupby functions that use
subplant_id
as one of the keys are not dropping observations with missing subplant values. - [ ] Although the primary purpose of the subplant ID is to group CEMS units with EIA generators and boilers, it could also be useful for grouping EIA boilers and generators that do not exist in CEMS. We should update the
pudl.analysis.epa_crosswalk
code to generate subplant IDs for all boilers/generators that exist in the EIA data, regardless of whether data exists in CEMS. - [ ] If there are any remaining missing subplant values, we should perhaps fill these missing values with a code of 99 so that there is a non-missing code that would not overlap with any subplant ids already assigned during the crosswalk process.