Inconsistent handling of tolerance can lead to cryptic failure

Open RossBoylan opened this issue 11 months ago • 0 comments

Symptom

> expect_equal(0.105487503092834, 0.10548753, tolerance = 1.5e-7)

Error: 0.105487503092834 (`actual`) not equal to 0.10548753 (`expected`).

actual != expected but don't know how to show the difference

Encountered while using testthat but debugging shows the problem lies in waldo, and so filing this report here.

Summary Analysis

The outer calls lead to waldo's num_equal which applies the tolerance to $|x-y|/|y| < \delta$ as long as $|y|>\delta$ where $\delta$ is the tolerance. Equivalently, it checks $|x-y|<\delta |y|$. Having discovered the difference, it then turns it over to compare_numeric() which, I think, attempts to show only as many digits as are necessary to show the difference. However, the logic here ignores the fact that the absolute difference that causes the test to fail is usually $\delta |y|$, and just uses $\delta$.

With the values shown above this means it's one digit short of what it needs to show the difference; since the numbers agree before that point there are no differences in the formatted values and so the list of (string) differences to be shown is empty. The result is the that compare_numeric() reports the "don't know how to show the difference error."

Possible Fix

In min_digits(), https://github.com/r-lib/waldo/blob/e31e97de04105491dcb03a921911af2f4eb16dd2/R/compare-value.R#L161 use instead

  if (!is.null(tolerance)) {
   if (abs(y)>tolerance)
       tolerance <- tolerance*abs(y)
    n <- min(n, digits(tolerance))
  }

Although this may solve the immediate problem, it does not address the fact that the logic for interpreting the tolerance is living in at least 2 separate places, which seems undesirable.

Fuller Analysis

Here's the call stack, with the innermost call on the top

num_equal takes a tolerance arg -> FALSE 3.2. compare_numeric produces actual != expected but don't know how to show the difference 3.1. num_equal says they don't match even though difference is -2.7e-8 and tol is 1.5e-7. x is 0.105, and so it must be doing relative tolerance. (it divided average absolute diff among those that differ by average y for same, unless average y < tolerance) x = 0.105487503092834 y = 0.10548753 3. compare_vector 2. compare_by_attr -> empty string

compare_terminate-> empty string compare_structure waldo::compare waldo_compare expect_waldo_equal expect_equal

The numbers indicate calls made in sequence: compare_structure() first called compare_terminate() which returned an empty string, then compare_by_attr() and finally compare_vector(). The latter first called num_equal() which said the arguments were not equal, and then compare_numeric() which first decided there were differences and then, after processing them, decided there weren't any.

The formatting weirdness between the stack line starting with 2. and the one with 1. is not intentional or significant; I just don't know how to prevent it.

Related Issues

As my note on num_equal() suggests, I found the interpretation of the tolerance surprising, consistent with #188 (at least with with scalar comparison I wasn't exposed to the average difference approach that it apparently uses). At least in waldo it is documented; in testthat the tolerance parameter is named, but absolutely no information on its exact meaning appears.

Mar 21 '24 22:03 RossBoylan

waldo waldo copied to clipboard

Inconsistent handling of tolerance can lead to cryptic failure

Symptom

Summary Analysis

Possible Fix

Fuller Analysis

Related Issues

waldo
waldo copied to clipboard