powerplantmatching
powerplantmatching copied to clipboard
Erroneous `Fueltype` and `Technology` combinations in base dataset
Version Checks (indicate both or one)
-
[x] I have confirmed this bug exists on the lastest release of powerplantmatching.
-
[x] I have confirmed this bug exists on the current
masterbranch of powerplantmatching.
Issue Description
Some technology combinations in the dataset do not make a lot of sense.
Some examples:
- Natural gas goes into Onshore (wind) and Pv capacity
- Wind driven PV
I've seen this happen for Lignite, Natural Gas, Wind and Oil
Reproducible Example
import powerplantmatching as ppm
plants = ppm.powerplants()
plants[(plants.Fueltype == "Wind")].Technology.unique()
plants[(plants.Fueltype == "Natural Gas")].Technology.unique()
Expected Behavior
This type of data point should be considered invalid by default. I suggest some post-validation of the dataset, checking for key pairs (i.e., Fueltype = Wind should only allow "Onshore" and "Offshore").
Installed Versions
Hey @irm-codebase,
could you double check? I can't reproduce this on the latest master:
>>> import powerplantmatching as ppm
>>> plants = ppm.powerplants(update=True)
>>> plants[(plants.Fueltype == "Wind")].Technology.unique()
array(['Onshore', 'Offshore', nan], dtype=object)
>>> plants[(plants.Fueltype == "Natural Gas")].Technology.unique()
array(['CCGT', 'Steam Turbine', nan, 'Not Found', 'Combustion Engine', 'not found'], dtype=object)
But some minor clean up is necessary
@lkstrp thanks for checking!
I did manage to replicate these problems in the main branch, unfortunately.
Here is some PV that can consume biomass / oil / nat. gas:
And some Onshore wind that thinks natural gas is a clean fuel!
There are also inconsistent names for some technologies: Pv / PV or Not Found / not found / unknown / <NA> / nan, which might lead to some of these funky results.
I can confirm that these misallocations occur (yet just for small capacity totals).
This inaccuracy comes from the matching and fuel inference by name in some datasets. Some sites are Combi-sites with both PV and a combustion power plant, hence, the misallocation.
In the best case, in unmatched sources, things like "Not Found" would be marked as NaN, then the matching algorithm can fill in from lower-reliability datasets (if present).
Agree there should probably be a screening of plausibility at the end.
@fneum pretty much The issue is not the small capacity, but the need to filter out this data after the fact since it leads to processing errors.