BabelStream icon indicating copy to clipboard operation
BabelStream copied to clipboard

Dot verification fails with single precision

Open jrprice opened this issue 8 years ago • 10 comments

We probably just need to increase the tolerance. The error will also be proportional to the size of the arrays (unlike with the other kernels), so we need to make sure whatever error checking tolerance we use is robust enough to avoid these sorts of false positives for any sort of input.

Validation failed on sum. Error 0.000209808
Sum was 39.7910385131836 but should be 39.7912483215332

jrprice avatar Dec 13 '16 23:12 jrprice

We currently check that the sum array is within 1.0E-8 of the expected value for doubles and floats. We could either:

  1. Use 1.0E-5 for for floats and 1.0E-8 for doubles
  2. Factor in the array size somehow

Option 1 is simple but might hide errors. If the arrays contain correct values, then as long as the reduction is close for this benchmark it might be suitable. Option 2 might be hard to quantify how we bias the array size.

tomdeakin avatar Feb 25 '17 14:02 tomdeakin

Would it be possible to make the "sum" has double datatype irrespective of input args, "double or float" so that it gives accurate results. The "sum" is an user data type used for comparison with "glodSum" value so it should not be matter. I mean, change the "Template sum" to "double sum"

Srinivasuluch avatar May 22 '17 05:05 Srinivasuluch

For devices which do not support double precision would this not pose a problem?

tomdeakin avatar May 25 '17 10:05 tomdeakin

Hi, is this issue still being worked on?

zyzzyxdonta avatar Jul 20 '20 14:07 zyzzyxdonta

Yes, but we've not come up with a satisfactory solution yet.

tomdeakin avatar Jul 29 '20 13:07 tomdeakin

Thanks for your reply. Am I right in assuming that despite the verification failing, my measurements are still valid?

zyzzyxdonta avatar Jul 29 '20 13:07 zyzzyxdonta

If it's just the reduction that fails (dot), and the other kernels are OK then the contents of the arrays should be correct. If the result is close enough on inspection but fails because of the tolerance then it's probably fine. If the result is 0.0 or some other nonsense number then it might have done something really wrong...

tomdeakin avatar Jul 29 '20 13:07 tomdeakin

Alright, thanks a lot!

zyzzyxdonta avatar Jul 29 '20 13:07 zyzzyxdonta

@zjin-lcf suggested using different tolerances for the reduction result based on the data type (option 1 above).

tomdeakin avatar May 19 '21 13:05 tomdeakin