FSharp.Stats icon indicating copy to clipboard operation
FSharp.Stats copied to clipboard

[Feature Request] Add quantile function for normal distribution (inverse CDF)

Open kMutagene opened this issue 4 years ago • 0 comments

Is your feature request related to a problem? Please describe.

Q-Q Plots are a very handy exploratory plot when analyzing data. Currently, there is no way to construct these plots with FSharp.Stats functions only, as inverse CDF (quantile function) is missing.

To construct a (normal) Q-Q plot, first the empirical Quantiles are calculated by one of these methods for the ranked observations in the sample of interest:

image

Subsequently, each quantile for a ranked observation is used as input for the inverse CDF of a normal distribution (e.g. N(0,1)). These values are plotted against each other:

image

note that for the inverse CDF of the normal distribution, we need the inverse erf which is currently not there as well.

Describe the solution you'd like

Addition of:

  • inverse erf (or an approximation) as SpecialFunction
  • inverse normal CDF based on above
  • empirical quantile calculation functions (rankit etc.) in an adequate module (Distribution.Empirical?)

Describe alternatives you've considered

None. but here is a workaround to make it work as long as there are no built in functions:

Here is an inverse erf approximation i took from here:

let erfInv (f:float) =
    let s = sign f |> float
    let x = (1. - f) * (1. + f)
    let lnx = log x
    let tt1 = 2./(System.Math.PI * 0.147) + 0.5 * lnx
    let tt2 = 1./(0.147) * lnx

    (s * sqrt(-tt1 + sqrt(tt1*tt1 - tt2)))

Inverse CDF base don that:

let inverseCDFNorm mean sigma p = mean + ( sigma * (erfInv (2.* p - 1.) * (sqrt 2.)))

Code for above plot using rankit for a sample of 1000 observations from a normal distribution (should fall on a straight line on the plot):

#r "nuget: Plotly.NET, 2.0.0-beta9"
#r "nuget: FSharp.Stats, 0.4.0"

open Plotly.NET
open FSharp.Stats

let di = Distributions.Continuous.normal 100. 20.

let sample = List.init 1000 (fun x -> di.Sample())

let quants = 
    sample
    |> List.sort
    |> fun l ->
        l
        |> List.mapi (fun i x ->
            let r = float i + 1.
            (r - 0.5) / ( float l.Length)
        )

let norm =
    quants 
    |> List.map (inverseCDFNorm 0. 1.)

Distributions.Continuous.Normal.CDF 0. 1. 1.644669874

Chart.Point(
    norm,
    sample
    |> List.sort
)
|> Chart.withX_AxisStyle "theoretical quantiles"
|> Chart.withY_AxisStyle "observations"
|> Chart.Show

kMutagene avatar May 10 '21 13:05 kMutagene