powerplantmatching icon indicating copy to clipboard operation
powerplantmatching copied to clipboard

Erroneous `Fueltype` and `Technology` combinations in base dataset

Open irm-codebase opened this issue 7 months ago • 4 comments

Version Checks (indicate both or one)

  • [x] I have confirmed this bug exists on the lastest release of powerplantmatching.

  • [x] I have confirmed this bug exists on the current master branch of powerplantmatching.

Issue Description

Some technology combinations in the dataset do not make a lot of sense.

Some examples:

  • Natural gas goes into Onshore (wind) and Pv capacity

Image

  • Wind driven PV

Image

I've seen this happen for Lignite, Natural Gas, Wind and Oil

Reproducible Example

import powerplantmatching as ppm
plants = ppm.powerplants()
plants[(plants.Fueltype == "Wind")].Technology.unique()
plants[(plants.Fueltype == "Natural Gas")].Technology.unique()

Expected Behavior

This type of data point should be considered invalid by default. I suggest some post-validation of the dataset, checking for key pairs (i.e., Fueltype = Wind should only allow "Onshore" and "Offshore").

Installed Versions

0.7.1

irm-codebase avatar Apr 09 '25 15:04 irm-codebase

Hey @irm-codebase,

could you double check? I can't reproduce this on the latest master:

>>> import powerplantmatching as ppm
>>> plants = ppm.powerplants(update=True)
>>> plants[(plants.Fueltype == "Wind")].Technology.unique()
array(['Onshore', 'Offshore', nan], dtype=object)
>>> plants[(plants.Fueltype == "Natural Gas")].Technology.unique()
array(['CCGT', 'Steam Turbine', nan, 'Not Found', 'Combustion Engine', 'not found'], dtype=object)

But some minor clean up is necessary

lkstrp avatar Apr 14 '25 10:04 lkstrp

@lkstrp thanks for checking!

I did manage to replicate these problems in the main branch, unfortunately.

Here is some PV that can consume biomass / oil / nat. gas:

Image

And some Onshore wind that thinks natural gas is a clean fuel!

Image

There are also inconsistent names for some technologies: Pv / PV or Not Found / not found / unknown / <NA> / nan, which might lead to some of these funky results.

Image

irm-codebase avatar Apr 14 '25 13:04 irm-codebase

I can confirm that these misallocations occur (yet just for small capacity totals).

This inaccuracy comes from the matching and fuel inference by name in some datasets. Some sites are Combi-sites with both PV and a combustion power plant, hence, the misallocation.

In the best case, in unmatched sources, things like "Not Found" would be marked as NaN, then the matching algorithm can fill in from lower-reliability datasets (if present).

Agree there should probably be a screening of plausibility at the end.

fneum avatar Aug 21 '25 06:08 fneum

@fneum pretty much The issue is not the small capacity, but the need to filter out this data after the fact since it leads to processing errors.

irm-codebase avatar Aug 21 '25 07:08 irm-codebase