vctrs
vctrs copied to clipboard
integer64 can be cast to double and vice versa, but vec_cast_common fails
This is not what I expected. vctrs version 0.3.8:
i64 <- bit64::as.integer64(1)
dbl <- 1.0
vec_cast(i64, dbl)
[1] 1
vec_cast(dbl, i64)
integer64
[1] 1
vec_cast_common(dbl, i64)
Error in `stop_vctrs()`:
! Can't combine `..1` <double> and `..2` <integer64>.
Run `rlang::last_error()` to see where the error occurred.
> vec_cast_common( i64, dbl)
This is basically the same thing as https://github.com/r-lib/vctrs/issues/1309#issuecomment-760227425
This is not what I expected
The problem is that there is no way to decide what should be expected.
You could argue that the integer64 value should be coerced to double, like how vec_ptype2(<int>, <dbl>) works. But that goes against what the integer64 maintainer wanted, he chose to make c(<int64>, <dbl>) == <int64> and was very purposeful about this (for better or worse). Also converting integer64->double gets lossy with large values (the linked comment explains more).
Converting both to integer64 is just as bad, because it has a high probability of losing information in the double values (i.e. if there were any fractional values we'd have to error about a lossy cast). That also goes against what we have for vec_cast_common(<int>, <dbl>) == <dbl>s.
So basically we advise that you should be explicit about which way you want to go here, and force a specific common type yourself ahead of time with either as.integer64(), as.double(), or vec_cast_common(.ptype = integer64()/double())
I'm trying to write generic code that can handle multiple types in each of the two positions. If I wasn't doing that, then sure, I could do as.integer64() etc. But if I have to do that in most cases, then it's unclear what vec_cast() is for.
In any case, I don't understand the logic of vec_cast() returning a value while vec_cast_common() fails. If a can be cast to b, and indeed b can be cast to a, then surely a and b can be cast to a common type.
More generally, if the idea of vctrs was that original R type casting in e.g. base::c was a jerry-rigged mess, then isn't it consistent to override e.g. bit64::c.integer64? I note that vec_cast methods for integer64 are defined in vctrs itself.
If a can be cast to b, and indeed b can be cast to a, then surely a and b can be cast to a common type
Not necessarily. Even if cast methods for a->b and b->a exist, you still have to decide which of those two cast methods to use when you get objects of type <a> and <b> together with no other information. That is the job of vec_ptype2() and vec_ptype_common().
So really you could think of vec_cast_common() as:
vec_cast_common(..., .ptype = vec_ptype_common(...))
and vec_ptype_common(<int64>, <dbl>) is an error because vec_ptype2.integer64.double() doesn't exist. Again, because it is hard / ambiguous / impossible to decide which of the two cast methods should be used.
then isn't it consistent to override e.g. bit64::c.integer64
Overriding base R is one thing, but I feel like overriding the wishes of another package author is a little aggressive. In the description of the bit64 package it even says:
they have different semantics when combined with double, e.g. integer64 + double => integer64
I'm trying to write generic code that can handle multiple types in each of the two positions
Then it sounds like it would be on your users to make the decision about whether they want int64 or double in the final result
I'm still not even sure which way you'd like it to go. I assume you want the integer64 to be converted to a double?
OK, so the issue is that casting either way is fine, but casting to the common type makes it too hard to decide what to do. I think in those circumstances I would just pick one! FWIW, I'd assume that it's better to cast to double (and maybe die when the integers are too large) than to cast to integer (and silently lose anything after the decimal point).
My users just want to do e.g.
x <- as.integer64(rnorm(100))
santoku::chop(x, breaks = c(-2, 0, 2)) # like base::cut
and get a sensible answer, which will be a factor. So for them, there should be no decision to make. In turn, I just want to make mathematical comparisons between the two sets of numbers, and to be able to extend breaks to e.g. min(x) and max(x) if necessary. I thought vec_cast_common() would help with that by ensuring that x and breaks have a compatible base type. I'd use vec_cast(), one way or the other, but I'm not sure whether that will break different classes... in short, despite reading all the vctrs vignettes, I don't feel I have much of a "consistent mental model" of what is going on.
We will consider adding a double common type for vec_ptype2(<int64>, <dbl>)
In the meantime, typically when we have a generic function that works with any vector type, like chop(x, breaks), then I'd argue that x is the "required" / "main" input, and breaks is the "optional" one. When we have cases like that, we typically do breaks <- vec_cast(breaks, to = x) to enforce that breaks is the same type as x, which is generally what the user would expect. In other words, we don't want the common type here, because we assume that the user really wanted to use the type of x, and breaks was an optional thing they typed in by hand like your chop(<int64>, breaks = c(-2, 0, 2)) example from above.
We generally do this when the return value has the same type as x, to enforce type stability (i.e. the input and output have the same type). I know you always return a factor here, but I think the general principle of having internal type stability on x still applies, and it would bypass the need for a common type between integer64 and double for now.
On holiday with weak connectivity but will get back to you. Thanks for thinking about this.
On Tuesday, 22 February 2022, Davis Vaughan @.***> wrote:
We will consider adding a double common type for vec_ptype2(
, ) In the meantime, typically when we have a generic function that works with any vector type, like chop(x, breaks), then I'd argue that x is the "required" input, and breaks is the "optional" one. When we have cases like that, we typically do breaks <- vec_cast(breaks, to = x) to enforce that breaks is the same type as x, which is generally what the user would expect. In other words, we don't want the common type here, because we assume that the user really wanted to use the type of x, and breaks was an optional thing they typed in by hand like your chop(
, breaks = c(-2, 0, 2)) example from above. We generally do this when the return value has the same type as x, to enforce type stability (i.e. the input and output have the same type). I know you always return a factor here, but I think the general principle of having internal type stability on x still applies, and it would bypass the need for a common type between integer64 and double for now.
— Reply to this email directly, view it on GitHub https://github.com/r-lib/vctrs/issues/1542#issuecomment-1047941347, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMMT5Y2SZSJGZJHNGHFP3LU4OWZVANCNFSM5OYJO7ZA . You are receiving this because you authored the thread.Message ID: @.***>
-- Sent from Gmail Mobile
So I did come back to this. I understand your logic for using vec_cast(), but it wouldn't be ideal here. Consider:
chop(1:10, c(2.5, 7.5))
If one cast breaks to integer, then this would become chop(1:10, c(2L, 7L) which is not what the user intends. Finding the common supertype, with a minimum of information loss, is what's required. What counts as information loss, in this particular case, is anything related to comparison operations. I'm fine with e.g. time series objects losing their attributes.
If you have:
chop(x = 1:10, breaks = c(2.5, 7.5))
And your internal code does
vec_cast(breaks, to = x)
Then that is an error if the double breaks have any fractional components. i.e.:
library(vctrs)
chop <- function(x, breaks) {
vec_cast(breaks, to = x)
}
# this is lossy
chop(1L, breaks = c(2.5, 3.5))
#> Error in `chop()`:
#> ! Can't convert from `breaks` <double> to <integer> due to loss of precision.
#> • Locations: 1, 2
# this is fine
chop(1L, breaks = c(2, 3))
#> [1] 2 3
I personally still don't think you want a common super type. If you have an integer x and the user supplies fractional breaks, I feel like they are probably doing something wrong. If they supply a integerish double vector of breaks like c(2, 3) in the example above, that is reasonable (they just didn't supply the L to make it an integer) and we allow that.
Fractional breaks are a reasonable way to make sure you know that different integers will go in different classes, without having to think about left/right closure. But it's true there's always a risk.
At the moment I'm hacking a package-specific cast which usually falls back to vec_cast but can be a bit more lenient for specific cases.