Rcpp icon indicating copy to clipboard operation
Rcpp copied to clipboard

Rcpp sugar indexHash (unique, sort_unique, etc.) distinguish signed and unsigned zeros

Open SebKrantz opened this issue 1 year ago • 2 comments
trafficstars

Hi Dirk, sorry if this is not exactly as much effort as you expect for an issue, I just wanted to flag something reported to collapse (#648), which is present in both my hash functions written in C and your hash functions, and that is the following:

library(collapse)
#> Warning: package 'collapse' was built under R version 4.3.3
#> collapse 2.0.17, see ?`collapse-package` or ?`collapse-documentation`

x = round(rnorm(100))
unique(x)               # R
#> [1]  1  0  2 -1 -2
funique(x)              # My hash function in C
#> [1]  1  0  0  2 -1 -2
funique(x, sort = TRUE) # Rcpp::sugar::sort_unique()
#> [1] -2 -1  0  0  1  2
# More explicit proof
collapse:::sortuniqueCpp(x)
#> [1] -2 -1  0  0  1  2

# The solution
y = x + 0L

funique(y)              
#> [1]  1  0  2 -1 -2
collapse:::sortuniqueCpp(y)
#> [1] -2 -1  0  1  2

Created on 2024-10-31 with reprex v2.0.2

In words: R functions like round() create signed and unsigned zeros, whose hashes differ. A quite efficient remedy is to add an integer zero (gives like a 3% slower execution on my very efficient C hash). I'm considering to roll this out, but of course cannot control your code. So just pushing it to you as food for thought.

SebKrantz avatar Oct 31 '24 10:10 SebKrantz