Frustration-One-Year-With-R Non-numbers

Your article is extremely thorough. You touched on a few of these points below, but maybe there's something interesting here below. This is not really an "issue", more of a comment.

# Things that are not floating point numbers ---

# In theory:
NaN  # Not A (floating point) Number.  IEEE 754 standard.
NA   # Placeholder for an unknown value. Invented by R. Logical.
NULL # Empty object (like an empty set).  Nothing.
Inf  # Infinity
-Inf # Negative infinity

# In practice:
# Any mental model of NA/NaN will fail you. Dante's Inferno.

length(NA)   # 1  Something there, but we don't know what.
length(NaN)  # 1  Something there, but not representable.
length(NULL) # 0  Nothing there.

sqrt(-1)             # NaN. 'i' in mathematics, not defined in floating point.

# NaN is an NA, but NA is not an NaN
is.nan(NA) # FALSE
is.na (NaN) # TRUE

min(c())               # Inf
min(c(NA), na.rm=TRUE) # Inf
min(NaN)               # NaN

max(c())               # -Inf
max(c(NA), na.rm=TRUE) # -Inf
max(NaN)               # NaN

# https://en.wikipedia.org/wiki/Empty_sum
sum(NA)               # NA
sum(NA, na.rm=TRUE)   # 0    # Horrible

mean(NA)              # NA
mean(NA, na.rm=TRUE)  # NaN

var(NA)               # NA
var(NA, na.rm=TRUE)   # NA

# https://en.wikipedia.org/wiki/Empty_product
prod(NA)              # NA
prod(NA, na.rm=TRUE)  # 1    # Horrible

NA | TRUE   # TRUE
NA & FALSE  # FALSE

# https://en.wikipedia.org/wiki/Division_by_zero
0/0   # NaN
1/0   # Inf.  Shouldn't it be NaN?!

Inf >= NA  # NA.  If NA is placeholder, this should be TRUE!

NA * 0      # NA. Because NA could be Inf, and Inf*0 is NaN. Right???
NA ^ 0      # 1
NaN ^ 0     # 1

NA %in% 1:3 # FALSE
match(NA, 1:3) # NA

matrix(nrow=2,ncol=2)  # matrix initializes with NAs
vector(mode="numeric", length=2) # vector initializes with 0s

# NULL can be assigned to an object.
x <- NULL
x
# NULL assigned to list elements removes them.
x <- list(1,"a",TRUE)
x[[1]] <- NULL
x
# NULL assigned to data.frame columns removes them
x <- data.frame(a=1:2, b=3:4)
x
x$a <- NULL
x

# https://blog.revolutionanalytics.com/2016/07/understanding-na-in-r.html
https://stats.stackexchange.com/questions/5686/what-is-the-difference-between-nan-and-na

Mar 25 '22 19:03 kwstat

Very thorough post, and good notes here too! I couldn't help to point out that you CAN take the square root of -1 if you represent it as a complex number. I don't remember the last time I have seen complex numbers being used in R, but they are there 😄

sqrt(0i-1)
#> [1] 0+1i

Mar 26 '22 17:03 EmilHvitfeldt

A good read! I'll definitely link to this in a future version. I fully admit ignorance regarding these non-numbers, so it's no wonder that I never discovered these issues myself. I know that NA must always be handled with great care, but I always take NaN as a warning sign that I've committed a grave error and must fix it before making any other steps.

Speaking of warnings, I will defend R by saying that many of these examples throw warnings that you've not shown. However, a lot of them don't, so it's not like R is totally innocent. There's also a handful that I kind of explain. For example,

sum(NA)               # NA
sum(NA, na.rm=TRUE)   # 0    # Horrible

is bad, but I can see their reasoning. Sum of NA being NA makes sense and if you remove NA (as in the second example), then you've got an empty sum, which is certainly 0. The mean example probably returns NaN because it'll boil down to sum(empty_set)/length(empty_set) which is bound to be division by 0. I've got no such defence for the var example, but the use parameter in its documentation seems very relevant.

Mar 26 '22 23:03 ReeceGoding

I've linked to this page in the latest version. I'll keep the issue open just in case it attracts similar interesting comments.

Mar 26 '22 23:03 ReeceGoding

I sorta understand the point of view about

sum(NA, na.rm=TRUE)

But suppose x is a vector of monthly sales per seller. If a seller is not on the payroll for a year, you probably want the yearly total to be NA, not zero, so I've had to write code like:

if(all(is.na(x)) total=NA else total = sum(x, na.rm=TRUE)

Mar 27 '22 03:03 kwstat

I think we agree. I get why they thought it made sense, but whether or not it was a good idea is a totally different question that I have no answer for.

Mar 29 '22 21:03 ReeceGoding

Frustration-One-Year-With-R Frustration-One-Year-With-R copied to clipboard

Non-numbers

Frustration-One-Year-With-R
Frustration-One-Year-With-R copied to clipboard