MultipleTesting.jl
MultipleTesting.jl copied to clipboard
add missing-tolerant adjust method
This PR addresses issue #108 of including a method ignorant of missing
values in a vector of P values. The reason this method is necessary/useful is that when using split-apply-combine methods for categorical data grouping when missing
values are present, situations may arise where one of the grouped vectors contains only missing values and thus the resulting P value would be missing
as well.
The introduced method to adjust
creates a copy of the Pvalue vector, performs the correction method
on all the non-missing Pvalues, and overwrites the original nonmissing Pvalues in the copied vector, returning the modified copy with all the missing
values preserved in their original locations.
julia> a = [0.1, 0.2, missing, 0.4] ;
julia> adjust(a, Holm())
4-element Array{Union{Missing, Float64},1}:
0.30000000000000004
0.4
missing
0.4
This method should leave the original unaffected:
julia> b = adjust(rand(10), Holm())
10-element Array{Float64,1}:
1.0
1.0
1.0
1.0
1.0
0.18431430876447497
1.0
1.0
1.0
1.0
Thanks for this!
I have just one concern, that was already pointed out in #108 by @juliangehring. The behaviour here should depend a lot on how the missing
came about. If, say the i-th test is null, then is the missingness
independent of the p-value? If this is true for all nulls tests, then this PR is correct.
Otherwise, it is not. For example, iirc some eQTL software only computes p-values that are smaller than t
for some small t
; the rationale is that large p-values will never be rejected by a multiple testing procedure so it is not worth spending effort to compute them. In this case the missing
p-values should be replaced by 1
and not filtered out ( or they can be filtered out and then one can use the adjust(pvals, n, PValueAdjustmentMethod)
functionality in this package).
In any case, I do not mind having this PR as the default, but I think it should be made explicit to users.
That's a very good point I had not considered. It could, in theory, have a kwarg of a Bool to specify the handling of missing
as the ignore or include-as-1 method, with a suitable docstring explaining the difference. What are your thoughts on that?
@nignatiadis the committed change suggests a method accounting for the scenarios your describe:
julia> a = [0.1, 0.2, missing, 0.4] ;
julia> adjust(a, Holm())
4-element Array{Union{Missing, Float64},1}:
0.30000000000000004
0.4
missing
0.4
julia> adjust(a, Holm(), include_missing = false) # verbose version of default kwarg
4-element Array{Union{Missing, Float64},1}:
0.30000000000000004
0.4
missing
0.4
julia> adjust(a, Holm(), include_missing = true)
4-element Array{Float64,1}:
0.4
0.6000000000000001
1.0
0.8