oneDAL icon indicating copy to clipboard operation
oneDAL copied to clipboard

Added 2c_mom reference implementation

Open DhanusML opened this issue 1 year ago • 2 comments

Overview

Reference implementation of the statistics routine x2c_mom

These changes enable the example em_gmm_dense_batch with reference backend. This has been removed from the exclude list. The changes were tested on AWS Graviton3 with gcc+openblas build.

Notation

Data is a matrix $X\in\mathbb{R}^{p\times n}$. Each column is a $p$-dimensional vector sampled independently. The matrix $X$ is assumed to be stored in column-major fashion.

1. x2c_mom

The variance estimator is a $p$ dimensional vector whose $i$th component is $$v_i = \frac{1}{n-1}\sum_{j=1}^n (x_{ij} - \mu_i)^2.$$ The implementation first computes the second raw sum ($S^{(2)} := \sum_i x_i^2$) and mean ($\mu$); and then uses $$v = \frac{S^{(2)}}{n-1} - \frac{n}{n-1}\mu^2 = \frac{S^{(2)}}{n-1}-\frac{(S^{(1)})^2}{n(n-1)}$$ to compute the variance.

DhanusML avatar Jul 01 '24 07:07 DhanusML