pytesmo icon indicating copy to clipboard operation
pytesmo copied to clipboard

Unexpected behaviour with order of columns in IntercomparisonMetrics

Open s-scherrer opened this issue 3 years ago • 2 comments

When setting up a test for the new PairwiseIntercomparisonMetrics (#219) I stumbled on some (for me) unexpected behaviour.

I used a testcase with 4 datasets, and apparently the order of the datasets changed, leading to wrong results (see example below). The issue is caused by different orders of the dataset names passed to IntercomparisonMetrics. It works when dataset_names passed to IntercomparisonMetrics comes from get_dataset_names using a DataManager. If one just passes the names of the datasets in a different oder, the results are wrong. However, in the former case the wrong column (c1) is used as reference.

I believe the problem is that pytesmo automatically converts a dataset dictionary to a Data Manager (https://github.com/TUW-GEO/pytesmo/blob/e8cbda41c7556162894a93fc87c9f55e34bd136a/src/pytesmo/validation_framework/validation.py#L119), and in the following just uses an order derived from this.

Based on this order, perform_validation then renames the columns of the DataFrame to ref, k1, k2, and k3 (https://github.com/TUW-GEO/pytesmo/blob/e8cbda41c7556162894a93fc87c9f55e34bd136a/src/pytesmo/validation_framework/validation.py#L300), but if the order of dataset_names was different, the columns are not in the order they should be.

I think this should probably be fixed in Validation, since only validation knows which ones are reference and which ones are non-reference datasets.

@wpreimes, any thoughts on this?

from datetime import datetime
import pandas as pd

from pytesmo.validation_framework.validation import Validation
from pytesmo.validation_framework.metric_calculators import IntercomparisonMetrics, get_dataset_names
from pytesmo.validation_framework.data_manager import DataManager
# test data
startdate = datetime(2000, 1, 1)
enddate = datetime(2000, 12, 31)
dt_index = pd.date_range(start=startdate, end=enddate, freq='D')

names = ['ref', 'c1', 'c2', 'c3']
# always 0.5
df = pd.DataFrame(index=dt_index, data={
                  name: np.repeat(0.5, dt_index.size) for name in names})
df['c1'] += 0.2  # some positive bias
df['c2'] -= 0.2  # some negative bias
df['c3'] -= 0.3  # some more negative bias
df
ref c1 c2 c3
2000-01-01 0.5 0.7 0.3 0.2
2000-01-02 0.5 0.7 0.3 0.2
2000-01-03 0.5 0.7 0.3 0.2
2000-01-04 0.5 0.7 0.3 0.2
2000-01-05 0.5 0.7 0.3 0.2
... ... ... ... ...
2000-12-27 0.5 0.7 0.3 0.2
2000-12-28 0.5 0.7 0.3 0.2
2000-12-29 0.5 0.7 0.3 0.2
2000-12-30 0.5 0.7 0.3 0.2
2000-12-31 0.5 0.7 0.3 0.2

366 rows × 4 columns

Bias between c1 and ref is 0.2.

# set up datasets for validation
class DummyReader:
    def __init__(self, df, name):
        self.name = name
        self.data = pd.DataFrame(df[name])

    def read_ts(self, *args, **kwargs):
        return self.data
    
    def __repr__(self):
        return f"Reader for {self.name} with columns {list(self.data.columns)}"

datasets = {
    key+"name": {"columns": [key], "class": DummyReader(df, key)}
    for key in df
}
datasets
{'refname': {'columns': ['ref'], 'class': Reader for ref with columns ['ref']},
 'c1name': {'columns': ['c1'], 'class': Reader for c1 with columns ['c1']},
 'c2name': {'columns': ['c2'], 'class': Reader for c2 with columns ['c2']},
 'c3name': {'columns': ['c3'], 'class': Reader for c3 with columns ['c3']}}
# run validation
metrics = IntercomparisonMetrics(dataset_names=list(datasets.keys()))
val = Validation(
    datasets,
    "refname",
    scaling=None,
    metrics_calculators={(4, 4): metrics.calc_metrics}
)
results = val.calc(1, 1, 1)
/home/samuel/.local/lib/python3.8/site-packages/scipy/stats/stats.py:3508: PearsonRConstantInputWarning: An input array is constant; the correlation coefficent is not defined.
  warnings.warn(PearsonRConstantInputWarning())
/home/samuel/miniconda3/envs/pytesmo/lib/python3.8/site-packages/numpy/lib/function_base.py:2642: RuntimeWarning: invalid value encountered in true_divide
  c /= stddev[:, None]
/home/samuel/projects/QA4SM/pytesmo/src/pytesmo/scaling.py:250: RuntimeWarning: invalid value encountered in true_divide
  return ((src - np.mean(src)) / np.std(src)) * np.std(ref) + np.mean(ref)
results[list(results.keys())[0]]["BIAS_between_refname_and_c1name"]
array([0.4], dtype=float32)

The bias should be 0.2 here.

When using get_dataset_names instead of list(datasets.keys()) it works correctly:

datamanager = DataManager(datasets, ref_name="refname")
ds_names = get_dataset_names(datamanager.reference_name, datamanager.datasets, n=4)
ds_names, list(datasets.keys())
(['c1name', 'c2name', 'c3name', 'refname'],
 ['refname', 'c1name', 'c2name', 'c3name'])
metrics = IntercomparisonMetrics(dataset_names=ds_names)
val = Validation(
    #datamanager,
    datasets,
    "refname",
    temporal_ref="refname",
    scaling=None,
    metrics_calculators={(4, 4): metrics.calc_metrics}
)
results = val.calc(1, 1, 1)
/home/samuel/.local/lib/python3.8/site-packages/scipy/stats/stats.py:3508: PearsonRConstantInputWarning: An input array is constant; the correlation coefficent is not defined.
  warnings.warn(PearsonRConstantInputWarning())
/home/samuel/miniconda3/envs/pytesmo/lib/python3.8/site-packages/numpy/lib/function_base.py:2642: RuntimeWarning: invalid value encountered in true_divide
  c /= stddev[:, None]
/home/samuel/projects/QA4SM/pytesmo/src/pytesmo/scaling.py:250: RuntimeWarning: invalid value encountered in true_divide
  return ((src - np.mean(src)) / np.std(src)) * np.std(ref) + np.mean(ref)
results[list(results.keys())[0]]["BIAS_between_c1name_and_refname"]
array([0.2], dtype=float32)

s-scherrer avatar Mar 12 '21 18:03 s-scherrer

I introduced a new keyword argument rename_cols to Validation.calc and Validation.perform_validation that can be used to disable the renaming. This will be included in PR #219, and fixes the issue from above somewhat. However, I still think the renaming is a very hacky and fragile construct, and we should probably find a way to remove this in the future.

During testing I stumbled on a related problem: If I use a metrics calculator with (n, 2) in Validation, e.g.:

dr = pd.date_range("2000", "2020", freq="D")
n = len(dr)
x = np.ones(n) * 2
df = pd.DataFrame(
    {
        "reference": x,
        "zplus2": x + 2,
        "plus1": x + 1,
    },
    index=dr
)

datasets = {
    key + "_name": {"columns": [key], "class": DummyReader(df, key)}
    for key in df
}

val = Validation(
    datasets,
    "reference_name",
    scaling=None,  # doesn't work with the constant test data
    temporal_matcher=None,
    metrics_calculators={
        (3, 2): PairwiseIntercomparisonMetrics().calc_metrics
    }
)
results = val.calc(1, 1, 1)

Then the results dictionary has these keys:

(('plus1_name', 'plus1'), ('reference_name', 'reference')), 
(('plus1_name', 'plus1'), ('zplus2_name', 'zplus2')), 
(('reference_name', 'reference'), ('zplus2_name', 'zplus2'))

that is, the tuples are sorted alphabetically.

When using the new CombinedTemporalMatcher that I implemented (#219), which only returns combinations including ref

(('plus1_name', 'plus1'), ('reference_name', 'reference')), 
(('reference_name', 'reference'), ('zplus2_name', 'zplus2'))

In one case the reference is first, in the other case the reference is second, which is pretty unintuitive as a user. In this case I would expect that it's always (other, ref), or (ref, other), and not depend on the name of the datasets. The temporal matcher returns the matched pairs with keys (other, ref), which I was expecting to get.

I know that pytesmo by default does all combinations, and in this case a general rule is harder. But I think pytesmo should respect the order that was given by the temporal matcher. Furthermore, since dictionary orders are now reliable (since CPython 3.6, and since Python 3.7 in the standard), the default matcher should probably also respect the order of datasets in the DataManager/datasets dictionary, if it doesn't do this already.

s-scherrer avatar Mar 18 '21 17:03 s-scherrer

Addendum: This reordering is especially a problem when calculating bias, because the sign is then different in these two cases, and simply using tuple(sorted(key)) doesn't work for interpreting the result correctly.

So I think this needs to be fixed when we want to incorporate the new PairwiseIntercomparisonMetrics.

@wpreimes do you have an idea how far reaching those changes would be? Is this order being relied upon a lot in pytesmo?

s-scherrer avatar Mar 18 '21 17:03 s-scherrer