collapse icon indicating copy to clipboard operation
collapse copied to clipboard

[BUG] integer64 quietly fails on join with integer

Open arthurgailes opened this issue 8 months ago • 6 comments

Hello,

The following code returns 0 matches:

  library(collapse)

t1 <- data.frame(id = 1:5)

t2 <- fmutate(t1, id = bit64::as.integer64(id))

join(
  t1, t2,
  on = "id",
  how = "full"
)
#> full join: t1[id] 0/5 (0%) <NaN:1st> t2[id] 0/5 (0%)
#>                     id
#> 1  4607182418800017408
#> 2  4611686018427387904
#> 3  4613937818241073152
#> 4  4616189618054758400
#> 5  4617315517961601024
#> 6                    1
#> 7                    2
#> 8                    3
#> 9                    4
#> 10                   5

Created on 2025-05-07 with reprex v2.1.1

Thanks as always for a great package.

arthurgailes avatar May 06 '25 20:05 arthurgailes

Thanks, this will need some extra coding in C which I'm not sure is worth it as coercing both to integer64 works. But I'll think about it. Just don't expect a very quick fix.

SebKrantz avatar May 06 '25 21:05 SebKrantz

Perhaps let me add that the issue is – as you can verify – that coercing to double using as.double() doesn't give the original number back, so the problem here is that simple type conversion (as is the case with as.character()) doesn't solve the issue, and I can't access as.integer64() from C. So there would need to be specific logic for the case that you have a double and an integer64.

SebKrantz avatar May 07 '25 09:05 SebKrantz

Thanks, and obviously no rush or pressure. Two suggestions; feel free to disregard:

  • Simply warn/error with integer64 against anything else - or just be more type-strict in general here.
  • use as.character instead of as.double

Unfortunately, I don't know C at all so can't help.

arthurgailes avatar May 07 '25 12:05 arthurgailes

Thanks. But let me ask then what is the expected behavior?

fmutate(mtcars[1:5,], wt = bit64::as.integer64(wt)) |> with(as.character(wt))
[1] "2" "2" "2" "3" "3"

mtcars[1:5, "wt"]
[1] 2.620 2.875 2.320 3.215 3.440

It seems that conversion to integer64 does remove the decimals, as a normal as.integer() conversion would do. So actually I think join() gives the right answer here. What is more interesting is this case:

join(
    fmutate(mtcars[1:5,],cyl = bit64::as.integer64(cyl)), 
    mtcars[1:5,],
    on = "cyl",
    how = "inner"
)

Where we have a variable cyl that can actually be represented by an integer.

SebKrantz avatar May 07 '25 12:05 SebKrantz

I'm sorry, my example was an imprecise reproduction of the actual issue, joining with integer. I've updated the original to be more precise. With that, the below works:

join(
  t1, t2 |> fmutate(id = as.character(id)),
  on = "id",
  how = "full"
)

arthurgailes avatar May 07 '25 12:05 arthurgailes

as.double() also works here and would be more efficient. I can see if that can be done internally. But possily it will not work as as.double.integer64 is an externally defined method. Will check and revert.

SebKrantz avatar May 07 '25 12:05 SebKrantz

FYI, fmatch() and join() now support integer64.

SebKrantz avatar Aug 07 '25 17:08 SebKrantz