pytesmo
pytesmo copied to clipboard
Unexpected behaviour with order of columns in IntercomparisonMetrics
When setting up a test for the new PairwiseIntercomparisonMetrics
(#219) I stumbled on some (for me) unexpected behaviour.
I used a testcase with 4 datasets, and apparently the order of the datasets changed, leading to wrong results (see example below).
The issue is caused by different orders of the dataset names passed to IntercomparisonMetrics. It works when dataset_names
passed to IntercomparisonMetrics comes from get_dataset_names
using a DataManager. If one just passes the names of the datasets in a different oder, the results are wrong. However, in the former case the wrong column (c1) is used as reference.
I believe the problem is that pytesmo automatically converts a dataset dictionary to a Data Manager (https://github.com/TUW-GEO/pytesmo/blob/e8cbda41c7556162894a93fc87c9f55e34bd136a/src/pytesmo/validation_framework/validation.py#L119), and in the following just uses an order derived from this.
Based on this order, perform_validation
then renames the columns of the DataFrame to ref
, k1
, k2
, and k3
(https://github.com/TUW-GEO/pytesmo/blob/e8cbda41c7556162894a93fc87c9f55e34bd136a/src/pytesmo/validation_framework/validation.py#L300), but if the order of dataset_names
was different, the columns are not in the order they should be.
I think this should probably be fixed in Validation, since only validation knows which ones are reference and which ones are non-reference datasets.
@wpreimes, any thoughts on this?
from datetime import datetime
import pandas as pd
from pytesmo.validation_framework.validation import Validation
from pytesmo.validation_framework.metric_calculators import IntercomparisonMetrics, get_dataset_names
from pytesmo.validation_framework.data_manager import DataManager
# test data
startdate = datetime(2000, 1, 1)
enddate = datetime(2000, 12, 31)
dt_index = pd.date_range(start=startdate, end=enddate, freq='D')
names = ['ref', 'c1', 'c2', 'c3']
# always 0.5
df = pd.DataFrame(index=dt_index, data={
name: np.repeat(0.5, dt_index.size) for name in names})
df['c1'] += 0.2 # some positive bias
df['c2'] -= 0.2 # some negative bias
df['c3'] -= 0.3 # some more negative bias
df
ref | c1 | c2 | c3 | |
---|---|---|---|---|
2000-01-01 | 0.5 | 0.7 | 0.3 | 0.2 |
2000-01-02 | 0.5 | 0.7 | 0.3 | 0.2 |
2000-01-03 | 0.5 | 0.7 | 0.3 | 0.2 |
2000-01-04 | 0.5 | 0.7 | 0.3 | 0.2 |
2000-01-05 | 0.5 | 0.7 | 0.3 | 0.2 |
... | ... | ... | ... | ... |
2000-12-27 | 0.5 | 0.7 | 0.3 | 0.2 |
2000-12-28 | 0.5 | 0.7 | 0.3 | 0.2 |
2000-12-29 | 0.5 | 0.7 | 0.3 | 0.2 |
2000-12-30 | 0.5 | 0.7 | 0.3 | 0.2 |
2000-12-31 | 0.5 | 0.7 | 0.3 | 0.2 |
366 rows × 4 columns
Bias between c1
and ref
is 0.2.
# set up datasets for validation
class DummyReader:
def __init__(self, df, name):
self.name = name
self.data = pd.DataFrame(df[name])
def read_ts(self, *args, **kwargs):
return self.data
def __repr__(self):
return f"Reader for {self.name} with columns {list(self.data.columns)}"
datasets = {
key+"name": {"columns": [key], "class": DummyReader(df, key)}
for key in df
}
datasets
{'refname': {'columns': ['ref'], 'class': Reader for ref with columns ['ref']},
'c1name': {'columns': ['c1'], 'class': Reader for c1 with columns ['c1']},
'c2name': {'columns': ['c2'], 'class': Reader for c2 with columns ['c2']},
'c3name': {'columns': ['c3'], 'class': Reader for c3 with columns ['c3']}}
# run validation
metrics = IntercomparisonMetrics(dataset_names=list(datasets.keys()))
val = Validation(
datasets,
"refname",
scaling=None,
metrics_calculators={(4, 4): metrics.calc_metrics}
)
results = val.calc(1, 1, 1)
/home/samuel/.local/lib/python3.8/site-packages/scipy/stats/stats.py:3508: PearsonRConstantInputWarning: An input array is constant; the correlation coefficent is not defined.
warnings.warn(PearsonRConstantInputWarning())
/home/samuel/miniconda3/envs/pytesmo/lib/python3.8/site-packages/numpy/lib/function_base.py:2642: RuntimeWarning: invalid value encountered in true_divide
c /= stddev[:, None]
/home/samuel/projects/QA4SM/pytesmo/src/pytesmo/scaling.py:250: RuntimeWarning: invalid value encountered in true_divide
return ((src - np.mean(src)) / np.std(src)) * np.std(ref) + np.mean(ref)
results[list(results.keys())[0]]["BIAS_between_refname_and_c1name"]
array([0.4], dtype=float32)
The bias should be 0.2 here.
When using get_dataset_names
instead of list(datasets.keys())
it works correctly:
datamanager = DataManager(datasets, ref_name="refname")
ds_names = get_dataset_names(datamanager.reference_name, datamanager.datasets, n=4)
ds_names, list(datasets.keys())
(['c1name', 'c2name', 'c3name', 'refname'],
['refname', 'c1name', 'c2name', 'c3name'])
metrics = IntercomparisonMetrics(dataset_names=ds_names)
val = Validation(
#datamanager,
datasets,
"refname",
temporal_ref="refname",
scaling=None,
metrics_calculators={(4, 4): metrics.calc_metrics}
)
results = val.calc(1, 1, 1)
/home/samuel/.local/lib/python3.8/site-packages/scipy/stats/stats.py:3508: PearsonRConstantInputWarning: An input array is constant; the correlation coefficent is not defined.
warnings.warn(PearsonRConstantInputWarning())
/home/samuel/miniconda3/envs/pytesmo/lib/python3.8/site-packages/numpy/lib/function_base.py:2642: RuntimeWarning: invalid value encountered in true_divide
c /= stddev[:, None]
/home/samuel/projects/QA4SM/pytesmo/src/pytesmo/scaling.py:250: RuntimeWarning: invalid value encountered in true_divide
return ((src - np.mean(src)) / np.std(src)) * np.std(ref) + np.mean(ref)
results[list(results.keys())[0]]["BIAS_between_c1name_and_refname"]
array([0.2], dtype=float32)
I introduced a new keyword argument rename_cols
to Validation.calc
and Validation.perform_validation
that can be used to disable the renaming. This will be included in PR #219, and fixes the issue from above somewhat. However, I still think the renaming is a very hacky and fragile construct, and we should probably find a way to remove this in the future.
During testing I stumbled on a related problem: If I use a metrics calculator with (n, 2)
in Validation, e.g.:
dr = pd.date_range("2000", "2020", freq="D")
n = len(dr)
x = np.ones(n) * 2
df = pd.DataFrame(
{
"reference": x,
"zplus2": x + 2,
"plus1": x + 1,
},
index=dr
)
datasets = {
key + "_name": {"columns": [key], "class": DummyReader(df, key)}
for key in df
}
val = Validation(
datasets,
"reference_name",
scaling=None, # doesn't work with the constant test data
temporal_matcher=None,
metrics_calculators={
(3, 2): PairwiseIntercomparisonMetrics().calc_metrics
}
)
results = val.calc(1, 1, 1)
Then the results dictionary has these keys:
(('plus1_name', 'plus1'), ('reference_name', 'reference')),
(('plus1_name', 'plus1'), ('zplus2_name', 'zplus2')),
(('reference_name', 'reference'), ('zplus2_name', 'zplus2'))
that is, the tuples are sorted alphabetically.
When using the new CombinedTemporalMatcher
that I implemented (#219), which only returns combinations including ref
(('plus1_name', 'plus1'), ('reference_name', 'reference')),
(('reference_name', 'reference'), ('zplus2_name', 'zplus2'))
In one case the reference is first, in the other case the reference is second, which is pretty unintuitive as a user. In this case I would expect that it's always (other, ref), or (ref, other), and not depend on the name of the datasets. The temporal matcher returns the matched pairs with keys (other, ref), which I was expecting to get.
I know that pytesmo by default does all combinations, and in this case a general rule is harder. But I think pytesmo should respect the order that was given by the temporal matcher. Furthermore, since dictionary orders are now reliable (since CPython 3.6, and since Python 3.7 in the standard), the default matcher should probably also respect the order of datasets in the DataManager/datasets dictionary, if it doesn't do this already.
Addendum: This reordering is especially a problem when calculating bias, because the sign is then different in these two cases, and simply using tuple(sorted(key))
doesn't work for interpreting the result correctly.
So I think this needs to be fixed when we want to incorporate the new PairwiseIntercomparisonMetrics.
@wpreimes do you have an idea how far reaching those changes would be? Is this order being relied upon a lot in pytesmo?