pyFAI icon indicating copy to clipboard operation
pyFAI copied to clipboard

Correct formula for standard error of mean

Open kif opened this issue 2 years ago • 1 comments

PyFAI performs, since version 0.18, means of intensities weighted by some normalization (solid angle * polarization * ...) and the pixel splitting factor. The propagation of variance has been implemented, either from a statistical law (Poisson) or from the deviation to the mean in the ensemble of pixels falling in each bin. This is available since v0.21. The standard deviation σ (STD hereafter) calculated from variance propagation should be correct and it is intensively used in the sigma-clipping.

The reported error after integration is the standard error of the mean (SEM) which correspond to the standard deviation divided by √n in an unweighted formalism. The formula has naively been translated into √Ω, where Ω = Σ ω in pyFAI 0.21

This wikipedia page provides the correct correction factor: https://en.wikipedia.org/wiki/Weighted_arithmetic_mean which results in this formula: SEM = STD * √(Σ ω²) / Ω

Since the weight ω are close to unity, this difference was never seen ... so far. Thus this bug.

kif avatar Jan 31 '22 07:01 kif

To correct this bug, the factor Σ ω² needs to be calculated which means changing the structure of every single rebinning engine and risks to jeopardize some performance optimization done in Cython or OpenCL.

  • [ ] Histogram rebinning engines: 1 extra histogram to calculate, performance penalty of 20%
    • [ ] Python
    • [ ] Cython
    • [ ] OpenCL
  • [x] CSR Matrix multiplication: switch from 4 to 5 sums.
    • [x] Python: one extra multiplication: performance penalty of 20%
    • [x] Cython: No penalty expected
    • [x] OpenCL: float8 structures need to be re-arranged, using compensated arithmetics for signal and variance and non-compensated one for ω² and count.

kif avatar Jan 31 '22 07:01 kif