Dazhong Xia
Dazhong Xia
TODO: - [ ] Still have the EIA-860 fields to do, but the EIA-923 ones appear to be behaving themselves. - [ ] pull the slowly-varying-fields check into a `pudl.validate`...
OK, so: If i put the fields into the `FIELDS_METADATA` then I get a “fields not being used” error in test, because we don’t have these assets defined as resources....
By actually adding these schemas to our RESOURCE_METADATA & writing them out to DB, I hooked them up to a lot of data quality checks I hadn't before. Some changes:...
After much debugging of the `_core_eia860__cooling_equipment` ↔️ `core_eia860__scd_plants` foreign key which seemed to be fine, I realized the actual FK error in the tests was with the **EIA923** table instead,...
So for each XBRL context (utility name, report year, ... - the primary key, basically) we could have multiple sets of facts from multiple XBRL filings. Currently we treat the...
Did some more digging & thinking, screed ahead! Here are the possible ways to deduplicate: * take the first one ("first snapshot") * take the last one ("last snapshot") *...
After talking a bit more with @cmgosnell - let's do a little bit more investigation to better understand the impact of changing our deduplication approach: 1. Let's take a look...
To find the values that are actually reported as null values, I looked for `xsi:nil="true"` for all the filings that might have 2021 data (e.g. 2022 and 2021 filings.) What...
Unfortunately, it seems like we don't have a *super* clean distribution to differentiate between "definitely a diff" and "definitely a snapshot" - here's the distribution of "number of non-null data...
To be fair, that’s sort of our fault for not distinguishing between non-reported values and reported null values in our SQLite conversion. Though the FERC filings might be weird in...