Documentation: guidelines for interpreting R^2 and stddev
In running the ~2000 criterion benchmarks currently on stackage, we've run into the problem that some of them are clearly nonlinear.
We are trying to develop a data cleaning step which will throw these out before comparing two different runs of the full benchmark set. Once we find good thresholds, I would like to put some guidelines together to share these with criterion users.
This goes along with other documentation updates, e.g. #92 and #95.
CC @RyanGlScott @vollmerm
@rrnewton almost two years late, but I wonder how did you proceed with your data cleaning step? Rather than throwing performance datasets away it would be very instructive if these were publicly available.