pudl icon indicating copy to clipboard operation
pudl copied to clipboard

Make retirement status consistent with retirement dates during last year of operation

Open TrentonBush opened this issue 2 years ago • 0 comments

In the EIA 860 Generators table (and dependencies), generators with mid-year retirement_date are classified as operational_status = 'retired'. This means I can't use operational_status to filter out retired plants without losing mid-year data. This can lead to strange analysis results, such as a solar PV plant (plant_id_eia = 10029) having a half million MMBTUs of gas consumption in 2020 (in reality it was co-located with a gas plant that retired mid-year).

It makes sense that an annual classification can't exactly match a higher resolution monthly classification. But why have two different resolutions at all? I think that invites misinterpretation.

Describe the solution you'd like Change the operational_status of plants that retired mid-year to existing. This fixes the mid year data loss when using operational_status to filter out retired plants. At first I thought there would be a symmetric problem when filtering FOR retirements, but that's actually not the case: you can't select for retired plants after their retirement date (the data doesn't exist).

Describe alternatives you've considered

  • Rename operational_status to operational_status_at_year_end or something that hints at the temporal nature of the classification.
  • I wanted to drop operational_status entirely, but there are 8800 records with null retirement_date but operational_status = 'retired', so it provides some additional information.

Additional context A table showing how operational_status and retirement_date IS NULL align.

retire_date_is_null operational_status count(*)
NOT NULL existing 4
NOT NULL retired 47666
NULL NULL 81
NULL BU 332
NULL existing 401007
NULL proposed 32934
NULL retired 8800

Created with:

SELECT
    (CASE retirement_date is NULL
        WHEN TRUE THEN 'NULL'
        ELSE 'NOT NULL'
    END) as retire_date_is_null,
    operational_status,
    count(*)
FROM generators_eia860
group by 1, 2
;

TrentonBush avatar Jun 16 '22 23:06 TrentonBush