R icon indicating copy to clipboard operation
R copied to clipboard

Ensure consistency of operations with respect to `Inf`, `NA`, `NaN`

Open sebffischer opened this issue 1 year ago • 8 comments

Inspired by a mastodon post about the behavior of

NA + NaN

And

NaN + NA

in R

sebffischer avatar Aug 01 '24 09:08 sebffischer

Thankfully, for now this is definitely not an issue as we currently don't support NaN. This was intentional, as I'd like to keep the number of esoteric values rather minimal. In the same spirit as https://github.com/dgkf/R/issues/106, I'd prefer that these eventually become vectors of type unions.

dgkf avatar Aug 01 '24 16:08 dgkf

This does open up a question for whether mathematical operator commutative-ness should be enforced by the language.

Enforcing that their commutative would mean that something like ggplot2, which (I think) has asymmetric operators, might need to slightly alter its API, but I don't think it would really deter any ggplot2-like tools.

That said, I think the ggplot2 API makes more sense with a |> anyways, so maybe this hinting intuition is a nudge that enforcing commutative math operators would encourage more intuitive APIs altogether.

dgkf avatar Aug 01 '24 16:08 dgkf

Thankfully, for now this is definitely not an issue as we currently don't support NaN. This was intentional, as I'd like to keep the number of esoteric values rather minimal. In the same spirit as #106, I'd prefer that these eventually become vectors of type unions.

Even though we did not explicitly encode NaN in the language, it can still be the result of a mathematical operation. NaN is part of the floating point specification and is obtained by e.g. calculating 0 / 0. I think it is only available for floats, not for integers.

However, combining NaN with NAs seems to -- at least at first sight -- behave like we want it to:

> 0 / 0 + NA
[1] NA
> NA + 0 / 0
[1] NA
> 0 / 0
[1] NaN
> 

But I think this needs to be checked properly. We might need to pay some additional attention when coercing a Vector::Double to Vector::Integer. Because integers don't support NaN, as.integer(NaN) results in an NA_integer_ in R.

This does open up a question for whether mathematical operator commutative-ness should be enforced by the language.

I think I like it! If I remember correctly, someone from ggplot2 (maybe even Hadley) once said that they kind of regretted using + in ggplot2 because the operator is not commutative (but don't quote me on that).

sebffischer avatar Aug 01 '24 16:08 sebffischer

0 / 0 [1] NaN

:eyes:

How did that get in there! It's surely just the internal f64.. but I never realized it snuck in. I'd probably have to defer to someone who does more stats algorithm development than I do to learn how useful NaNs are. Personally, I never really require them, but I'm sure in modelling tools they're very handy.

It would be nice to reduce the different exotic values where possible.. but I can also see some value in holding on to it for /0 scenarios.

dgkf avatar Aug 01 '24 17:08 dgkf

It would be nice to reduce the different exotic values where possible.. but I can also see some value in holding on to it for /0 scenarios.

In principle yes, but even if we wanted to we could not get rid of NaN without paying for it with a significant overhead. These special floating-point values are encoded in the CPU instructions. If at all, we should replace the floating-point NA with an NaN, as it should be more efficient.

sebffischer avatar Aug 02 '24 12:08 sebffischer

An informative document on NaN: https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/nan-propagation.pdf

sebffischer avatar Oct 09 '24 05:10 sebffischer

One main learning here is that double NAs can be represented more efficiently than the using OptionNA<f64>. In the IEEE 754 standard there are different 'versions' of NaN (quiet and signalling). Further, there is payload information that can be used. R uses the payload 1954 to represent a double NA. When payload propagation is ensured (RISC-V processors unfortunately don't enforce this; it was mentioned in the article above as well as here) this means that NA-propagation does not have to be handled by us and is hence way more efficient I believe as compared to setting this on top as is done by the OptionNA<f64>.

sebffischer avatar Oct 09 '24 06:10 sebffischer

https://github.com/rick-de-water/nonany would allow for similar integer na treatment as in R

sebffischer avatar Oct 10 '24 16:10 sebffischer