[BUG] integer64 quietly fails on join with integer
Hello,
The following code returns 0 matches:
library(collapse)
t1 <- data.frame(id = 1:5)
t2 <- fmutate(t1, id = bit64::as.integer64(id))
join(
t1, t2,
on = "id",
how = "full"
)
#> full join: t1[id] 0/5 (0%) <NaN:1st> t2[id] 0/5 (0%)
#> id
#> 1 4607182418800017408
#> 2 4611686018427387904
#> 3 4613937818241073152
#> 4 4616189618054758400
#> 5 4617315517961601024
#> 6 1
#> 7 2
#> 8 3
#> 9 4
#> 10 5
Created on 2025-05-07 with reprex v2.1.1
Thanks as always for a great package.
Thanks, this will need some extra coding in C which I'm not sure is worth it as coercing both to integer64 works. But I'll think about it. Just don't expect a very quick fix.
Perhaps let me add that the issue is – as you can verify – that coercing to double using as.double() doesn't give the original number back, so the problem here is that simple type conversion (as is the case with as.character()) doesn't solve the issue, and I can't access as.integer64() from C. So there would need to be specific logic for the case that you have a double and an integer64.
Thanks, and obviously no rush or pressure. Two suggestions; feel free to disregard:
- Simply warn/error with integer64 against anything else - or just be more type-strict in general here.
- use as.character instead of as.double
Unfortunately, I don't know C at all so can't help.
Thanks. But let me ask then what is the expected behavior?
fmutate(mtcars[1:5,], wt = bit64::as.integer64(wt)) |> with(as.character(wt))
[1] "2" "2" "2" "3" "3"
mtcars[1:5, "wt"]
[1] 2.620 2.875 2.320 3.215 3.440
It seems that conversion to integer64 does remove the decimals, as a normal as.integer() conversion would do. So actually I think join() gives the right answer here. What is more interesting is this case:
join(
fmutate(mtcars[1:5,],cyl = bit64::as.integer64(cyl)),
mtcars[1:5,],
on = "cyl",
how = "inner"
)
Where we have a variable cyl that can actually be represented by an integer.
I'm sorry, my example was an imprecise reproduction of the actual issue, joining with integer. I've updated the original to be more precise. With that, the below works:
join(
t1, t2 |> fmutate(id = as.character(id)),
on = "id",
how = "full"
)
as.double() also works here and would be more efficient. I can see if that can be done internally. But possily it will not work as as.double.integer64 is an externally defined method. Will check and revert.
FYI, fmatch() and join() now support integer64.