ARROW-17425: [R] `lubridate::as_datetime()` in dplyr query should be able to handle time in sub seconds
This change allows strings containing sub-seconds and double types to be used as input to lubridate::as_datetime().
1.5 |>
arrow::arrow_table(x = _) |>
dplyr::mutate(
y = lubridate::as_datetime(x)
) |>
dplyr::collect() |>
dplyr::mutate(
z = lubridate::as_datetime(x),
is_equal = (y == z)
)
#> x y z is_equal
#> 1 1.5 1970-01-01 00:00:01 1970-01-01 00:00:01 TRUE
And, because the timestamp type generated by as_datetime is expected to be used in combination with other functions, fix the bug of as.Date and lubridate::as_date that could cause an error if a sub-seconds timestamp was entered.
As a breaking change, the return type of as_datetime() will be nanoseconds, but I hope this will not have a major impact, since originally as_datetime() |> as.integer() or as_datetime() |> as.numeric() could not be used because it would try to cast to int32 or double, resulting in an error.
(We have to cast timestamp to int64)
arrow 9.0.0
1 |>
arrow::arrow_table(x = _) |>
dplyr::mutate(
x = lubridate::as_datetime(x),
y = cast(x, arrow::int64())
) |>
dplyr::collect()
#> x y
#> 1 1970-01-01 00:00:01 1
This PR
1 |>
arrow::arrow_table(x = _) |>
dplyr::mutate(
x = lubridate::as_datetime(x),
y = cast(x, arrow::int64())
) |>
dplyr::collect()
#> x y
#> 1 1970-01-01 00:00:01 1000000000
https://issues.apache.org/jira/browse/ARROW-17425
:warning: Ticket has not been started in JIRA, please click 'Start Progress'.
since originally
as_datetime() |> as.integer()oras_datetime() |> as.numeric()could not be used because it would try to cast to int32 or double, resulting in an error.
Maybe we have to add these test cases (and JIRA issues for implementing these ARROW-17428).
This PR seems to have stopped working because of #14553. I don't know what the problem is.
Additional apologies for letting this one slip by our PR review triage! I will look at this tomorrow 🙂
If I'm reading this properly, as.integer() on a timestamp will error? If that's the case, this wouldn't be a breaking change? In any case silently ignoring fractional seconds seems worse?
I took a quick skim and I don't see any problems off the top of my head. There might possibly be a performance hit from the added complexity of the function but my preference is always safety >>> speed (until somebody complains).
@paleolimbot Nah, what I mean is that casting a datetime to an integer via cast() will return the integer value in nanoseconds rather than seconds.
I think that we always want cast() to do whatever Arrow compute does, and in this case, the cast() just drops the metadata (true for most of the datetime types I know about). That property is useful for doing things like this that require that value.
I think that we always want
cast()to do whatever Arrow compute does, and in this case, thecast()just drops the metadata (true for most of the datetime types I know about). That property is useful for doing things like this that require that value.
OK, makes sense. In that case then this looks pretty much good to go, will just give it a final look over.
If I'm reading this properly,
as.integer()on a timestamp will error?
Yes, this needs to be implemented in the future. (#20369)
Benchmark runs are scheduled for baseline = 872bbda4eb2e75690f0c756c616c171ad6615739 and contender = 157062d4b3a8709555ce4abae10c554e4df05cca. 157062d4b3a8709555ce4abae10c554e4df05cca is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] ec2-t3-xlarge-us-east-2
[Finished :arrow_down:0.69% :arrow_up:0.0%] test-mac-arm
[Finished :arrow_down:0.0% :arrow_up:0.0%] ursa-i9-9960x
[Finished :arrow_down:0.5% :arrow_up:0.12%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 157062d4 ec2-t3-xlarge-us-east-2
[Finished] 157062d4 test-mac-arm
[Finished] 157062d4 ursa-i9-9960x
[Finished] 157062d4 ursa-thinkcentre-m75q
[Finished] 872bbda4 ec2-t3-xlarge-us-east-2
[Finished] 872bbda4 test-mac-arm
[Finished] 872bbda4 ursa-i9-9960x
[Finished] 872bbda4 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java