fishtest
fishtest copied to clipboard
Tests bias - origins
Hi everyone,
Since I started tests on fishtest, I am wondering about the origin of the "bias" or "distribution" of tests on each job. I don't know if there is already a study about this, but clearly big differences are observed when we compare jobs of same test.
I suppose that we can imagine 3 sources of the bias : 1- opening positions bias : depending on the position randomly choosen, master or test version of stockfish can be stronger, 2- hardware bias for same worker : hyperthreading and several other hardware details can lead to some random slight nps differences between master/tested, 3- hardware bias between different workers : we can imagine that results can be affected by cpu speed, amount of memory etc ...
At final, I think that if we detect the source of the bias, we can try to improve the fishtest procedure or at least better understand it. Less biais should lead to a more accurate ELO evaluation and less garbage. One interessant test can be to launch a test with same beginning position and see how much the biais change. But it seems to me that we cannot do this. We can also try to force one only worker etc ...
we do attempt to correct for 2-3 by measuring nps and rescaling the time accordingly.
that doesn't help if base and test have different properties (like new net arch)