cf-python Set numerical tolerance when doing comparisons with cf.Query objects

Currently, the numerical queries cf.le, cf.ge and cf.wi are susceptible to rounding issues, as the inequality operators can not be modified by the settings of rtol and atol.

This could be changed by replacing the operations with a strict inequality operation and a numerically tolerant equality operation:

E.g.

cf.le(5) = 5.00000000000001

would effectively be implemented as

5 < 5.00000000000001 | numpy.allclose(5, 5.00000000000001, rtol=rtol, atol=atol)

This would help greatly when using cf.Query in, for example, subspace definitions.

API: the functions should have rtol and atol keyword parameters. (cf.wi(3, 4, rtol=45, atol=0.0002))

Possible issue: Under this scheme, cf.wo would not be the exact inverse of cf.wi

Dec 17 '20 17:12 davidhassell

Possible issue: Under this scheme, cf.wo would not be the exact inverse of cf.wi

Ooh that sounds potentially a bit nasty... fun for a later release I suppose!

Dec 17 '20 17:12 sadielbartholomew

Currently, == and != are on cf.Data objects are always numerically tolerant, using the current environmental settings of rtol and atol (which default to the system epsilon) https://github.com/NCAS-CMS/cf-python/blob/master/cf/data/data.py#L4790-L4795

                if method == '__eq__':
                    array0 = numpy.isclose(array0, array1, rtol=rtol, atol=atol)
                elif method == '__ne__':
                    array0 = ~numpy.isclose(array0, array1, rtol=rtol, atol=atol)

Perhaps this is not appropriate. all six comparison operation or should be intolerant, as they are in numpy, This would open the way up for cf.Query objects to provide a way of setting tolerance on a case-by-case basis. Something like:

cf.eq(6) - not tolerant
cf.eq(6, tol=True) - tolerant, uses environment settings of atol, rtol
cf.eq(6, rtol=0.000001, atol=1e-9) - tolerant, use the given values for rtol, atol

and similarly for the other five comparison operators.

Feb 08 '21 09:02 davidhassell

Interesting points, @davidhassell. In my eyes the ideal case is to allow full configuration of the tolerance (or lack of) in some simple and intuitive way, and your idea:

This would open the way up for cf.Query objects to provide a way of setting tolerance on a case-by-case basis. Something like: cf.eq(6) - not tolerant cf.eq(6, tol=True) - tolerant, uses environment settings of atol, rtol cf.eq(6, rtol=0.000001, atol=1e-9) - tolerant, use the given values for rtol, atol

sounds like a neat way to do that.

The one point I think that might be debatable about it is whether the default should be to have no tolerance (as suggested above i.e. tol=False), or the global tolerance (i.e. tol=True), as I think different users might prefer different things. Could that in itself be a configuration option (a Boolean global item default_tolerant or something) or am I being overzealous on configurability (it's certainly possible)...

Either way, I support the suggestions made.

Feb 08 '21 14:02 sadielbartholomew

... re-reading my comment I am thinking I might have missed the point by questioning the default being zero tolerance.

I thought I'd check the Python Array API Standard (though you may already have done so) in case there was anything there to guide, but there's nothing on tolerance per-se in the location I'd expect it, namely: https://data-apis.github.io/array-api/latest/design_topics/accuracy.html.

Feb 08 '21 14:02 sadielbartholomew

I hadn't checked the Array API Standard, just what I knew of numpy - I guess it's no surprise that they are consistent. I like your idea of a "default tolerant" config variable. I think things are a bit inconsistent right now. For example, Data.equals is always tolerant, but comparisons in cf.aggregate are, by default, not (although they can be forced to be, at the expense of performance).

I don't think we should leap into anything soon - rather think about it after the migration to dask has occured. Best not change our API before then, unless we really have to.

Feb 08 '21 16:02 davidhassell

Fixed by #674

Nov 22 '23 11:11 davidhassell