MultipleTesting.jl icon indicating copy to clipboard operation
MultipleTesting.jl copied to clipboard

add missing-tolerant adjust method

Open pdimens opened this issue 3 years ago • 3 comments

This PR addresses issue #108 of including a method ignorant of missing values in a vector of P values. The reason this method is necessary/useful is that when using split-apply-combine methods for categorical data grouping when missing values are present, situations may arise where one of the grouped vectors contains only missing values and thus the resulting P value would be missing as well.

The introduced method to adjust creates a copy of the Pvalue vector, performs the correction method on all the non-missing Pvalues, and overwrites the original nonmissing Pvalues in the copied vector, returning the modified copy with all the missing values preserved in their original locations.

julia> a = [0.1, 0.2, missing, 0.4] ;

julia> adjust(a, Holm())
4-element Array{Union{Missing, Float64},1}:
 0.30000000000000004
 0.4
  missing
 0.4

This method should leave the original unaffected:

julia> b = adjust(rand(10), Holm())
10-element Array{Float64,1}:
 1.0
 1.0
 1.0
 1.0
 1.0
 0.18431430876447497
 1.0
 1.0
 1.0
 1.0

pdimens avatar Jul 20 '20 19:07 pdimens

Thanks for this!

I have just one concern, that was already pointed out in #108 by @juliangehring. The behaviour here should depend a lot on how the missing came about. If, say the i-th test is null, then is the missingness independent of the p-value? If this is true for all nulls tests, then this PR is correct.

Otherwise, it is not. For example, iirc some eQTL software only computes p-values that are smaller than t for some small t; the rationale is that large p-values will never be rejected by a multiple testing procedure so it is not worth spending effort to compute them. In this case the missing p-values should be replaced by 1 and not filtered out ( or they can be filtered out and then one can use the adjust(pvals, n, PValueAdjustmentMethod) functionality in this package).

In any case, I do not mind having this PR as the default, but I think it should be made explicit to users.

nignatiadis avatar Jul 20 '20 19:07 nignatiadis

That's a very good point I had not considered. It could, in theory, have a kwarg of a Bool to specify the handling of missing as the ignore or include-as-1 method, with a suitable docstring explaining the difference. What are your thoughts on that?

pdimens avatar Jul 20 '20 20:07 pdimens

@nignatiadis the committed change suggests a method accounting for the scenarios your describe:

julia> a = [0.1, 0.2, missing, 0.4] ;

julia> adjust(a, Holm())
4-element Array{Union{Missing, Float64},1}:
 0.30000000000000004
 0.4
  missing
 0.4

julia> adjust(a, Holm(), include_missing = false)    # verbose version of default kwarg
4-element Array{Union{Missing, Float64},1}:
 0.30000000000000004
 0.4
  missing
 0.4

julia> adjust(a, Holm(), include_missing = true)
4-element Array{Float64,1}:
 0.4
 0.6000000000000001
 1.0
 0.8

pdimens avatar Jul 21 '20 00:07 pdimens