waldo
waldo copied to clipboard
Inconsistent handling of tolerance can lead to cryptic failure
Symptom
> expect_equal(0.105487503092834, 0.10548753, tolerance = 1.5e-7)
Error: 0.105487503092834 (`actual`) not equal to 0.10548753 (`expected`).
actual != expected but don't know how to show the difference
Encountered while using testthat
but debugging shows the problem lies in waldo
, and so filing this report here.
Summary Analysis
The outer calls lead to waldo
's num_equal
which applies the tolerance to $|x-y|/|y| < \delta$ as long as $|y|>\delta$ where $\delta$ is the tolerance. Equivalently, it checks $|x-y|<\delta |y|$. Having discovered the difference, it then turns it over to compare_numeric()
which, I think, attempts to show only as many digits as are necessary to show the difference. However, the logic here ignores the fact that the absolute difference that causes the test to fail is usually $\delta |y|$, and just uses $\delta$.
With the values shown above this means it's one digit short of what it needs to show the difference; since the numbers agree before that point there are no differences in the formatted values and so the list of (string) differences to be shown is empty. The result is the that compare_numeric()
reports the "don't know how to show the difference error."
Possible Fix
In min_digits()
,
https://github.com/r-lib/waldo/blob/e31e97de04105491dcb03a921911af2f4eb16dd2/R/compare-value.R#L161
use instead
if (!is.null(tolerance)) {
if (abs(y)>tolerance)
tolerance <- tolerance*abs(y)
n <- min(n, digits(tolerance))
}
Although this may solve the immediate problem, it does not address the fact that the logic for interpreting the tolerance is living in at least 2 separate places, which seems undesirable.
Fuller Analysis
Here's the call stack, with the innermost call on the top
num_equal takes a tolerance arg -> FALSE 3.2. compare_numeric produces actual != expected but don't know how to show the difference 3.1. num_equal says they don't match even though difference is -2.7e-8 and tol is 1.5e-7. x is 0.105, and so it must be doing relative tolerance. (it divided average absolute diff among those that differ by average y for same, unless average y < tolerance) x = 0.105487503092834 y = 0.10548753 3. compare_vector 2. compare_by_attr -> empty string
- compare_terminate-> empty string compare_structure waldo::compare waldo_compare expect_waldo_equal expect_equal
The numbers indicate calls made in sequence: compare_structure()
first called compare_terminate()
which returned an empty string, then compare_by_attr()
and finally compare_vector()
. The latter first called num_equal()
which said the arguments were not equal, and then compare_numeric()
which first decided there were differences and then, after processing them, decided there weren't any.
The formatting weirdness between the stack line starting with 2.
and the one with 1.
is not intentional or significant; I just don't know how to prevent it.
Related Issues
As my note on num_equal()
suggests, I found the interpretation of the tolerance surprising, consistent with #188 (at least with with scalar comparison I wasn't exposed to the average difference approach that it apparently uses). At least in waldo
it is documented; in testthat
the tolerance parameter is named, but absolutely no information on its exact meaning appears.