StatsAPI.jl Add a function for extracting a test statistic

This is added to StatsAPI rather than to HypothesisTests since StatsAPI also houses HypothesisTest, pvalue, and other relevant functionality.

Defining this function will bring a resolution to the 7-year-old issue https://github.com/JuliaStats/HypothesisTests.jl/issues/79, which has received a number of duplicates over the years, suggesting that it would be of general interest.

I considered the name statistic, and still rather prefer that (teststatistic has too many s's and t's 😩), but wasn't sure whether it was insufficiently descriptive. I'd be interested in hearing thoughts both on naming and on whether this should be required for HypothesisTest as pvalue is or whether it should be optional (what I have currently).

Jul 29 '23 19:07 ararslan

Codecov Report

Patch and project coverage have no change.

Comparison is base (64d7d28) 100.00% compared to head (27e6dc6) 100.00%.

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #24   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            3         3           
  Lines           37        37           
=========================================
  Hits            37        37

Files Changed	Coverage Δ
src/StatsAPI.jl	`100.00% <ø> (ø)`

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jul 29 '23 19:07 codecov-commenter

@nalimilan, did you have thoughts on the function name? I find statistic appealing. Looks like @babaq would prefer teststat (at least over teststatistic), which seems okay I guess but I generally like to avoid shortening words when possible.

Jul 29 '23 23:07 ararslan

I'm afraid statistic is too general so I prefer teststatistic or teststat. I'm not sure whether it's best to abbreviate or not, as we are quite inconsistent in that regard (e.g. confint vs. loglikelihood).

Jul 30 '23 14:07 nalimilan

I like teststatistic most, I think. I agree that statistic seems a bit too general, and I prefer not shortening the function name (but I'd be fine with teststat as well).

Jul 31 '23 16:07 devmotion

statistic feels like it’s asking for a name collision and teststatistic is long and awkward. My ranking: teststat > statistic >> teststatistic

Jul 31 '23 16:07 palday

statistic feels like it’s asking for a name collision

Only two registered packages define a function called statistic: Bootstrap and Hecke. Bootstrap's definition could/should extend this one from StatsAPI. Hecke defines statistic but doesn't use, document, or export it; it seems to be dead code.

MixedAnova, AnovaBase, and WildBootTests all define a teststat function.

Amusingly, HypothesisTests defines teststatistic, which I didn't realize. It's only defined for VarianceEqualityTest and isn't documented nor exported though.

Jul 31 '23 16:07 ararslan

So actually, wouldn't the generality of statistic be a reasonable thing for an interface function? After all, the whole point of StatsAPI is for packages to share names. 😄 The meaning becomes unambiguous in the context of the type of the input.

Jul 31 '23 18:07 ararslan

After a little bit of thinking, statistic actually sounds nice because it could also be used in other, non-testing contexts.

Jul 31 '23 21:07 palday

People often complain that we abuse generic functions by overloading them with methods which actually have little of nothing in common, so I though using a more specific name like teststat would be more appropriate.

Maybe ping the authors of the packages you mentioned to get their opinion? We definitely want all packages to use the same function so we need some of them to agree switching to the new function.

Aug 04 '23 20:08 nalimilan

People often complain that we abuse generic functions by overloading them with methods which actually have little of nothing in common

I was not aware of this. What are other examples?

Aug 04 '23 22:08 ararslan

Maybe ping the authors of the packages you mentioned to get their opinion? We definitely want all packages to use the same function so we need some of them to agree switching to the new function.

I think present company have HypothesisTests covered and I don't think this is relevant for Hecke, but @yufongpeng for AnovaBase/MixedAnova, @droodman for WildBootTests, and @juliangehring for Bootstrap: hello! Thanks for contributing to the Julia ecosystem. We're thinking of introducing a function in StatsAPI which, if named generically, could be useful to your respective packages. For AnovaBase, MixedAnova, and WildBootTests, it would correspond to the function @yufongpeng and @droodman have called teststat. For Bootstrap, it would correspond to what @juliangehring has called statistic. Since your respective packages all have StatsAPI as a transitive dependency already (by way of Distributions for WildBootTests and StatsBase for the others), extending the function defined here would not require taking on additional dependencies that wouldn't already need to be loaded. If you would be interested in integrating in this way, we would love your input here! What would be your preferred name for this function? Current contenders are:

statistic
teststatistic
teststat

This was originally motivated by the need for a generic accessor function to extract the value of a test statistic from a HypothesisTest object but its scope does not need to be limited to that.

Aug 04 '23 22:08 ararslan

Since this function is for test statistics, I prefer a more specific name. teststat or teststatistic is good, but statistic is too general.

Aug 05 '23 02:08 yufongpeng

I agree.

I'm happy to conform to any standards developed assuming it makes sense for my package.

On Fri, Aug 4, 2023, 10:11 PM Yu-Fong, Peng @.***> wrote:

Since this function is for test statistics, I prefer a more specific name. teststat or teststatistic is good, but statistic is too general.

— Reply to this email directly, view it on GitHub https://github.com/JuliaStats/StatsAPI.jl/pull/24#issuecomment-1666351424, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGB2Z2LUVMA6QXWRZTV33FDXTWTUDANCNFSM6AAAAAA24VSA3Q . You are receiving this because you were mentioned.Message ID: @.***>

Aug 05 '23 09:08 droodman

Thank you @yufongpeng and @droodman for your input!

Since this function is for test statistics

It isn't necessarily, that was just initial use case that prompted this discussion. Another notable example is Bootstrap, which defines a statistic function that returns the function used to compute the statistic (which needn't be a test statistic) on each sample. For example, statistic(bootstrap(mean, randn(20), BasicSampling(100))) == mean. Bootstrap could extend statistic from StatsAPI but it probably wouldn't make sense to extend teststatistic/teststat as that's insufficiently general.

Aug 06 '23 18:08 ararslan

I was not aware of this. What are other examples?

Probably the most problematic function is fit, for which we don't even document possible arguments. But most other functions in StatsAPI have a well defined signature so that's really an exception.

Aug 12 '23 13:08 nalimilan

StatsAPI.jl StatsAPI.jl copied to clipboard

Add a function for extracting a test statistic

Codecov Report

StatsAPI.jl
StatsAPI.jl copied to clipboard