open-grid-emissions icon indicating copy to clipboard operation
open-grid-emissions copied to clipboard

Ensure complete `subplant_id` mapping

Open grgmiller opened this issue 2 years ago • 9 comments

Currently, subplant IDs are only created for units that exist both in CEMS and EIA-923, meaning that there are certain generators/units that have a subplant ID of NaN.

  • [x] Ensure that all merge and groupby functions that use subplant_id as one of the keys are not dropping observations with missing subplant values.
  • [ ] Although the primary purpose of the subplant ID is to group CEMS units with EIA generators and boilers, it could also be useful for grouping EIA boilers and generators that do not exist in CEMS. We should update the pudl.analysis.epa_crosswalk code to generate subplant IDs for all boilers/generators that exist in the EIA data, regardless of whether data exists in CEMS.
  • [ ] If there are any remaining missing subplant values, we should perhaps fill these missing values with a code of 99 so that there is a non-missing code that would not overlap with any subplant ids already assigned during the crosswalk process.

grgmiller avatar Jun 07 '22 17:06 grgmiller