xskillscore icon indicating copy to clipboard operation
xskillscore copied to clipboard

Feature request: ttest_ind

Open ahuang11 opened this issue 5 years ago • 13 comments

Adding two tailed test? https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

ahuang11 avatar Sep 08 '20 14:09 ahuang11

would this be different from https://esmtools.readthedocs.io/en/latest/api/esmtools.testing.ttest_ind_from_stats.html?

I think there are many tests to be added for xskillscore in theory, but is the T-test applicable to forecasting and hence xskillscore?

aaronspring avatar Sep 08 '20 14:09 aaronspring

I think my use case is relevant to forecasting (seeing whether the forecast anomaly values are statistically significantly different from some reference values), but I wasn't aware that xskillscore is only limited to forecasting; I thought it was simply a collection of scores for use with xarray/dask. In fact, I think that ttest_ind_from_stats in esmtools would be more better suited for xskillscore than esmtools because ttest_ind isn't just limited to ESM analysis (thus in scipy). Also ttest_ind is similar to pearson_r_p_val in my opinion, but instead of doing statistical significance testing on correlation, it'd be doing it on differences.

ahuang11 avatar Sep 08 '20 15:09 ahuang11

I think my use case is relevant to forecasting (seeing whether the forecast anomaly values are statistically significantly different from some reference values), but I wasn't aware that xskillscore is only limited to forecasting; I thought it was simply a collection of scores for use with xarray/dask. In fact, I think that ttest_ind_from_stats in esmtools would be more better suited for xskillscore than esmtools because ttest_ind isn't just limited to ESM analysis (thus in scipy).

We are entering the confusing web of scope creep :) My view is:

  1. esmtools is really just my personal package of nice methods I find myself using frequently. That happens to be focused on Earth System analysis, so I called it that. It's a junk drawer for functions though.
  2. xskillscore tagline is " Metrics for verifying forecasts ", so it should focus on statistical metrics for forecast verification. It can't really enter the territory of true skill scores, since you need three inputs: (a) the dynamical forecast, (b) the verification data, (c) the reference forecast (e.g. persistence or climatology). To get BSS for instance you need to run xs.brier_score(a, b) and xs.brier_score(c,b) and then derive from there. It could be done, but seems like scope creep.
  3. climpred is a high-level package for forecast verification. It handles reference forecasts, bias correction, etc. That's where things like BSS should live, which will pull in low-level metrics from xskillscore.

bradyrx avatar Sep 08 '20 15:09 bradyrx

But yes my question would be if you're looking for statistical significance that isn't bootstrapping, why not pearson_r_p_value? It uses the t-test there to get your pvalue. However, I think you can derive the critical correlation coefficient (i.e. the minimum corr needed for significance) by a simple manipulation of the t index. So I would be for adding it here. In the end, you'd get a gridded t-test value for a comparison of two gridded results. I don't care if there's redundancy here and esmtools because obviously xskillscore is used much more widely than my personal package.

bradyrx avatar Sep 08 '20 15:09 bradyrx

Since I am not calculating any correlations, I don't think pearson_r_p_value is relevant, but I could be wrong.

For example, for pearson_r/_p_value, I commonly use it for anomaly correlations, e.g. (forecast - climatology) correlated against (analysis - climatology)

In this case, I don't have climatology, I simply have (forecast - reference) and I think t_test would be more suitable here?

I could be wrong though~

ahuang11 avatar Sep 08 '20 15:09 ahuang11

  1. xskillscore tagline is " Metrics for verifying forecasts ", so it should focus on statistical metrics for forecast verification. It can't really enter the territory of true skill scores, since you need three inputs: (a) the dynamical forecast, (b) the verification data, (c) the reference forecast (e.g. persistence or climatology). To get BSS for instance you need to run xs.brier_score(a, b) and xs.brier_score(c,b) and then derive from there. It could be done, but seems like scope creep.

At the same time the package name is xskillscore so should I trust the package name or tagline more hah; maybe xskillscore can be renamed to xscore or xverif similar to how xarray was renamed from xray.

Anyways, I don't think t-test ind is a skill score, so I think it could be added here similar to pearsonr p-value

ahuang11 avatar Sep 08 '20 15:09 ahuang11

To add to the convo i'm seeing tests appear in the ML space e.g. https://spark.apache.org/docs/latest/api/python/pyspark.ml.html#module-pyspark.ml.stat

I'm fine with adding these tests as i'm personally interested in extending this package for use in ML workloads.

I see this repo as a host of ufuncs which can easily be applied to xarray objects (and pd.DataFrame.to_xarray()). Where the host of ufuncs is within the realm of how different/similar are two distributions (this started with deterministic metrics: correlation (and p-value), RMSE, etc. then we added probabilistic and recently contingency). I'm fine with adding stat tests which I imagine can be slotted here https://github.com/raybellwaves/xskillscore/blob/master/xskillscore/tests/test_metric_results_accurate.py#L28

raybellwaves avatar Sep 08 '20 19:09 raybellwaves

We just recently officially modified the climpred tagline to broaden our scope to: "Verification of weather and climate forecasts." We could also broaden this tagline to be general. Something like: "Vectorized statistical assessments for forecasts and machine learning workloads."

The point @raybellwaves is to use these as minimization functions for training an ML model for instance? And then of course the testing like ttest_ind and https://github.com/raybellwaves/xskillscore/pull/176 can evaluate different model performances.

bradyrx avatar Sep 08 '20 19:09 bradyrx

Moving tagline discussion to https://github.com/raybellwaves/xskillscore/issues/180

testing like ttest_ind and #176 can evaluate different model performances

Yes I like the test module @aaronspring created in https://github.com/raybellwaves/xskillscore/pull/176

raybellwaves avatar Sep 08 '20 20:09 raybellwaves

So to clarify, is it okay to add add ttest_ind to xskillscore?

I view xskillscore as my one stop for all xarray scoring~

ahuang11 avatar Sep 08 '20 22:09 ahuang11

@ahuang11, yes, I would add it to the new test.py module that @aaronspring is developing in #176. Another thing I have code for that is useful is a fisher z transformation to test statistical significance between two different grids of correlations.

bradyrx avatar Sep 08 '20 22:09 bradyrx

I've been testing xr.apply_ufunc scipy.ttest_ind and I feel like the runtime is kind of slow, especially if applying it for each lat/lon. Maybe could try rewriting in numpy.

ahuang11 avatar Sep 10 '20 21:09 ahuang11

moved to https://github.com/xarray-contrib/xskillscore/issues/284 leaving open

raybellwaves avatar Mar 18 '21 19:03 raybellwaves

see https://github.com/dougiesquire/xstatstests

raybellwaves avatar Aug 18 '22 20:08 raybellwaves

I'm happy to port xstatstests into xskillscore if we think that's appropriate. It didn't feel right to me when I first thought about it as they aren't skill scores, but maybe others feel differently?

dougiesquire avatar Aug 18 '22 22:08 dougiesquire

I think it’s fine to have themseparate. We have very few tests which are all related to one forecast being better than another here only.

aaronspring avatar Aug 19 '22 08:08 aaronspring