statistics
statistics copied to clipboard
Bootstrap for linear regression could return NaN
Problem is simple. During resampling we can get sample where all x are same. This means we get singular matrix when solving linear equation and answer will be NaN which contaminate everything.
Originally reported as https://github.com/bos/criterion/issues/65
BTW, I had to update my minimal example in https://github.com/bos/criterion/issues/65#issuecomment-261001563 in order to make it work with statistics-0.14:
$ ghci
GHCi, version 8.0.2: http://www.haskell.org/ghc/ :? for help
Loaded GHCi configuration from /Users/rscott/.ghci
λ> :m + System.Random.MWC Statistics.Regression Data.Word Statistics.Types
λ> import qualified Data.Vector.Unboxed as U
λ> :set -XOverloadedLists
λ> gen <- initialize ([1..1000] :: U.Vector Word32)
λ> bootstrapRegress gen 1000 (mkCL 0.95) olsRegress [[1.0,2.0,3.0,4.0]] [0.4834418371319771,0.9643802028149366,1.4471413176506758,1.9452479053288698]
([Estimate {estPoint = 0.48681793194264117, estError = ConfInt {confIntLDX = 5.879566259681779e-3, confIntUDX = 4.313440807164948e-3, confIntCL = mkCLFromSignificance 5.0000000000000044e-2}},Estimate {estPoint = -6.992014124988053e-3, estError = ConfInt {confIntLDX = 4.018643125892002e-2, confIntUDX = 9.495485574005886e-3, confIntCL = mkCLFromSignificance 5.0000000000000044e-2}}],Estimate {estPoint = 0.999930103564569, estError = ConfInt {confIntLDX = 5.246837438233065e-5, confIntUDX = NaN, confIntCL = mkCLFromSignificance 5.0000000000000044e-2}})