`exceptions`-based exceptions
Rather than Nothing/0/NaN etc. (the first option being way better than the others), it would be great to generalize code that may throw to the MonadThrow class from exceptions.
This way, functions using throwM (e :: Exception) would have the signature MonadThrow m => ... -> m ( ... ), where m may become Maybe, or Either e, or even IO, according to the calling context.
Related: #128 , #100 , #111 , #118 ...
That's excellent suggestion!
I've started addressing this here: https://github.com/DataHaskell/statistics/tree/exceptions-not-error
I'm actually halfway through implementing it. Thing us once you touch S.Sample you need to adjust basically everything
Yes, I noticed, error is used pretty much throughout. We could skip refactoring the input validation parts for now (i.e. zero input size or negative parameters etc.) and focus on the important ones, e.g. the NaN correlations etc. For example, I've replaced Sample.correlation with this:
-- | Correlation coefficient for sample of pairs. Also known as
-- Pearson's correlation. For empty sample it's set to zero.
correlation :: (G.Vector v (Double,Double), G.Vector v Double, MonadThrow m)
=> v (Double,Double)
-> m Double
correlation xy
| n == 0 = pure 0
| nearZero varX = throwM $ NaNE "Variance of X == 0"
| nearZero varY = throwM $ NaNE "Variance of Y == 0"
| otherwise = pure corr
where
corr = cov / sqrt (varX * varY)
n = G.length xy
(xs,ys) = G.unzip xy
(muX,varX) = meanVariance xs
(muY,varY) = meanVariance ys
cov = mean $ G.zipWith (*)
(G.map (\x -> x - muX) xs)
(G.map (\y -> y - muY) ys)
{-# SPECIALIZE correlation :: U.Vector (Double,Double) -> Maybe Double #-}
{-# SPECIALIZE correlation :: V.Vector (Double,Double) -> Maybe Double #-}
@Shimuuar would you like to join forces on this? I don't have an efficient implementation in mind for Matrix.generateSym , though
@Shimuuar https://github.com/Shimuuar would you like to join forces on this?
Sure although I won't be able to do anything till monday
Hi @Shimuuar :) as discussed, if you point me to your working branch for this we can figure out how to collaborate :)
I just pushed branch exception2 (exception was complete failure). It's mostly complete except for
Statistics.Samplesome functions are commented out and I'm thinking about using type classes frommonoid-statisticsfor things like calculation of mean and variance in single call (saving one evaluation of mean). Having dedicated functions is not terribly good since in that case we have combinatorial explosion.- Resampling. Again I'm thinking about jackknife which is clearly monoidal (although it's obscured by API)
- Bootstrap didn't even touch it
- Regression depends on resampling
- KruskalWallis test
- Few other thing I certainly forgot about
monoid-statistics is in rather poor state currently. I got lost in figuring out numeric precision and performance of different algorithms for variance
@Shimuuar Re. monoid-statistics ; did you know of foldl-statistics? https://hackage.haskell.org/package/foldl-statistics
Yes. Main difference is monoid-statistics exposes accumulator types and allows to merge estimates with several data set without refolding them.
Aha! that's a clever thing to have. However what do you think of setting up
speed benchmarks before looking into adding streaming capabilities?
I would like to start adding basic summary functionality to
criterion-measurement soon, to make it self-contained .
On Wed, Jul 25, 2018 at 11:33 AM, Aleksey Khudyakov < [email protected]> wrote:
Yes. Main difference is monoid-statistics exposes accumulator types and allows to merge estimates with several data set without refolding them.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bos/statistics/issues/141#issuecomment-407694775, or mute the thread https://github.com/notifications/unsubscribe-auth/AFoRqORK8RmfndEm34yTXJO7Ia-fMWfcks5uKDuFgaJpZM4S-3YM .
Why, of course! Without benchmarks all performance statements are just hopes and prayers