glow
glow copied to clipboard
Make allowedError / isEqual() independent of Tensor dynamic range
Tensor method isEqual() allows specifying a threshold of acceptable absolute error when comparing tensors: https://github.com/pytorch/glow/blob/master/include/glow/Base/Tensor.h#L871 This approach has a strong dependency on the dynamic range of the tensor under check. Tensors with small dynamic range are more sensitive to small errors than those with huge dynamic range. As a result, current solution cannot be applied systematically in all test cases. I propose to switch to (or at least add support for) a new check mechanism that allows specifying an error tolerance and is generic enough that can be applied independently of the test case. For instance, Symmetric Mean Absolute Percentage Error (SMAPE) could help providing this solution. This metric is both upper and lower bounded, so the result is easier to classify between acceptable (close to lower bound) or unacceptable (close upper bound).
I tried locally to switch from delta to SMAPE error threshold check and found a few tests that fail in the CI. For instance, Conv2D_PaddingSame from TFLiteImporterTest.
I0807 15:49:14.351974 68 Tensor.h:948] Tensors not equal: 1 out of 50 elements exceeded allowed error threshold 1e-06. Maximum error found was 4.62171e-06 at index 26: -0.00846339 vs. -0.00846347 I0807 15:49:14.352083 68 Tensor.h:954] Maximum relative error found was: 0.00027304 at index: 26: -0.00846339 v.s. -0.00846347
In this family of tests the isEqual()
check is invoked with allowedError set to 1e-6. This means that the accepted error delta is 1e-6. With SMAPE the same allowedError means we accept a delta error of 1e-6 / (abs(thisVal) + abs(otherVal)) which varies with the magnitude of the values under check. In the case of the Conv2D_PaddingSame element that fails delta is 8e-8 which passes the check but fails on SMAPE because 8e-8 / (0.00846339 + 0.00846347) = 4.7e-6 is higher than 1e-6.
Switching from delta to SMAPE check would require adjusting the usage of allowedError
in all the consumers of isEqual()
(at least the ones failing in the CI) because of the semantic change. It seems very few tests would require adjusting this threshold though.
A second non-invasive option could be the addition of an extra argument to the isEqual()
call to accept an allowedSMAPE
threshold and check either or both conditions based on the call configuration.
@et-nivard Thanks for investigating this. I'd be totally fine with making SMAPE the default. My only issue is that, while some tests as you point out will now fail, others likely need to have their error bound made tighter to be relevant. For example those that use large values likely had their allowed error set relatively big, because it's the absolute error allowed. So we'd want to do an inventory of all isEqual calls to make sure their bounds are relevant. And just to be clear, many of those bounds were set somewhat arbitrarily.