Frustration-One-Year-With-R
Frustration-One-Year-With-R copied to clipboard
Non-numbers
Your article is extremely thorough. You touched on a few of these points below, but maybe there's something interesting here below. This is not really an "issue", more of a comment.
# Things that are not floating point numbers ---
# In theory:
NaN # Not A (floating point) Number. IEEE 754 standard.
NA # Placeholder for an unknown value. Invented by R. Logical.
NULL # Empty object (like an empty set). Nothing.
Inf # Infinity
-Inf # Negative infinity
# In practice:
# Any mental model of NA/NaN will fail you. Dante's Inferno.
length(NA) # 1 Something there, but we don't know what.
length(NaN) # 1 Something there, but not representable.
length(NULL) # 0 Nothing there.
sqrt(-1) # NaN. 'i' in mathematics, not defined in floating point.
# NaN is an NA, but NA is not an NaN
is.nan(NA) # FALSE
is.na (NaN) # TRUE
min(c()) # Inf
min(c(NA), na.rm=TRUE) # Inf
min(NaN) # NaN
max(c()) # -Inf
max(c(NA), na.rm=TRUE) # -Inf
max(NaN) # NaN
# https://en.wikipedia.org/wiki/Empty_sum
sum(NA) # NA
sum(NA, na.rm=TRUE) # 0 # Horrible
mean(NA) # NA
mean(NA, na.rm=TRUE) # NaN
var(NA) # NA
var(NA, na.rm=TRUE) # NA
# https://en.wikipedia.org/wiki/Empty_product
prod(NA) # NA
prod(NA, na.rm=TRUE) # 1 # Horrible
NA | TRUE # TRUE
NA & FALSE # FALSE
# https://en.wikipedia.org/wiki/Division_by_zero
0/0 # NaN
1/0 # Inf. Shouldn't it be NaN?!
Inf >= NA # NA. If NA is placeholder, this should be TRUE!
NA * 0 # NA. Because NA could be Inf, and Inf*0 is NaN. Right???
NA ^ 0 # 1
NaN ^ 0 # 1
NA %in% 1:3 # FALSE
match(NA, 1:3) # NA
matrix(nrow=2,ncol=2) # matrix initializes with NAs
vector(mode="numeric", length=2) # vector initializes with 0s
# NULL can be assigned to an object.
x <- NULL
x
# NULL assigned to list elements removes them.
x <- list(1,"a",TRUE)
x[[1]] <- NULL
x
# NULL assigned to data.frame columns removes them
x <- data.frame(a=1:2, b=3:4)
x
x$a <- NULL
x
# https://blog.revolutionanalytics.com/2016/07/understanding-na-in-r.html
https://stats.stackexchange.com/questions/5686/what-is-the-difference-between-nan-and-na
Very thorough post, and good notes here too! I couldn't help to point out that you CAN take the square root of -1
if you represent it as a complex number. I don't remember the last time I have seen complex numbers being used in R, but they are there 😄
sqrt(0i-1)
#> [1] 0+1i
A good read! I'll definitely link to this in a future version. I fully admit ignorance regarding these non-numbers, so it's no wonder that I never discovered these issues myself. I know that NA
must always be handled with great care, but I always take NaN
as a warning sign that I've committed a grave error and must fix it before making any other steps.
Speaking of warnings, I will defend R by saying that many of these examples throw warnings that you've not shown. However, a lot of them don't, so it's not like R is totally innocent. There's also a handful that I kind of explain. For example,
sum(NA) # NA
sum(NA, na.rm=TRUE) # 0 # Horrible
is bad, but I can see their reasoning. Sum of NA
being NA
makes sense and if you remove NA
(as in the second example), then you've got an empty sum, which is certainly 0. The mean
example probably returns NaN
because it'll boil down to sum(empty_set)/length(empty_set)
which is bound to be division by 0. I've got no such defence for the var
example, but the use
parameter in its documentation seems very relevant.
I've linked to this page in the latest version. I'll keep the issue open just in case it attracts similar interesting comments.
I sorta understand the point of view about
sum(NA, na.rm=TRUE)
But suppose x is a vector of monthly sales per seller. If a seller is not on the payroll for a year, you probably want the yearly total to be NA, not zero, so I've had to write code like:
if(all(is.na(x)) total=NA else total = sum(x, na.rm=TRUE)
I think we agree. I get why they thought it made sense, but whether or not it was a good idea is a totally different question that I have no answer for.