r-polars
r-polars copied to clipboard
Support the `clock` package's types
It may also make sense to support conversions from the classes provided by the clock
package, which has a time type in ns
and would be more appropriate for time zone handling.
Originally posted by @eitsupi in https://github.com/pola-rs/r-polars/issues/578#issuecomment-1847203094
Did not know of the clock package, but it seems nice.
It seems the clock representations are R numeric (possibly pseudo integers) with an upper and lower part to have an internal precision of ~ 2^(52+52) which well over e.g. u64 nanoseconds of 2^(64). The downside is less performance. The upside is anyone can fairly easily tinker with the internals.
Any time precision day , second, nanosecond... have the same class but only one variable in difference called precision which is an enum-like R integer. The sub components are glued together to a single vector interface via vctrs::vctrs_rcrd.
I did not find the conversion arithmetics yet, but this should be fairly straight forward conversion using the extendr-api.
It seems year 3712 is not supported as the datetime overflows back 1958?! with no warning?!?!. This is surprising both overflowing without warning for package that is not designed for speed, but I imagined if allocating the lower part for nanoseconds, then the upper part can describe seconds since origin the should be some ~140 millions years of range.
both "s" and "ns" has issues also with "0001-01-01 01:01:01.000000001" whereas "ms" and "us" works fine. This gotta be a bug, I think.
char_times = c(
"0001-01-01 01:01:01.000000001",
"2212-01-01 12:34:57.123456789",
"3712-01-01 12:34:56.123456789"
)
fmt = "%Y-%m-%d %H:%M:%OS"
clock_times = list(
ns = clock::naive_time_parse(char_times , format = fmt, precision = "nanosecond"),
us = clock::naive_time_parse(char_times , format = fmt, precision = "microsecond"),
ms = clock::naive_time_parse(char_times , format = fmt, precision = "millisecond"),
s = clock::naive_time_parse(char_times , format = fmt, precision = "nanosecond"),
d = clock::naive_time_parse(char_times , format = fmt, precision = "day")
)
clock_times
#> $ns
#> <naive_time<nanosecond>[3]>
#> [1] "1754-08-30T23:44:42.128654849" "2212-01-01T12:34:57.123456789"
#> [3] "1958-05-04T13:51:14.994801941"
#>
#> $us
#> <naive_time<microsecond>[3]>
#> [1] "0001-01-01T01:01:01.000000" "2212-01-01T12:34:57.123456"
#> [3] "3712-01-01T12:34:56.123456"
#>
#> $ms
#> <naive_time<millisecond>[3]>
#> [1] "0001-01-01T01:01:01.000" "2212-01-01T12:34:57.123"
#> [3] "3712-01-01T12:34:56.123"
#>
#> $s
#> <naive_time<nanosecond>[3]>
#> [1] "1754-08-30T23:44:42.128654849" "2212-01-01T12:34:57.123456789"
#> [3] "1958-05-04T13:51:14.994801941"
#>
#> $d
#> <naive_time<day>[3]>
#> [1] "0001-01-01" "2212-01-02" "3712-01-02"
lapply(clock_times,\(x) unclass(x) |> str()) |> invisible()
#> List of 2
#> $ lower: num [1:3] 5.65e+08 3.93e+09 2.06e+09
#> $ upper: num [1:3] 2.71e+09 2.67e+09 1.72e+09
#> - attr(*, "precision")= int 10
#> - attr(*, "clock")= int 1
#> List of 2
#> $ lower: num [1:3] 2.13e+09 2.15e+09 2.16e+09
#> $ upper: num [1:3] 3.67e+09 3.11e+09 3.96e+09
#> - attr(*, "precision")= int 9
#> - attr(*, "clock")= int 1
#> List of 2
#> $ lower: num [1:3] 2.15e+09 2.15e+09 2.15e+09
#> $ upper: num [1:3] 3.99e+09 3.17e+08 9.32e+08
#> - attr(*, "precision")= int 8
#> - attr(*, "clock")= int 1
#> List of 2
#> $ lower: num [1:3] 5.65e+08 3.93e+09 2.06e+09
#> $ upper: num [1:3] 2.71e+09 2.67e+09 1.72e+09
#> - attr(*, "precision")= int 10
#> - attr(*, "clock")= int 1
#> List of 2
#> $ lower: num [1:3] 2.15e+09 2.15e+09 2.15e+09
#> $ upper: num [1:3] 4.29e+09 8.84e+04 6.36e+05
#> - attr(*, "precision")= int 4
#> - attr(*, "clock")= int 1
lapply(clock_times, class)
#> $ns
#> [1] "clock_naive_time" "clock_time_point" "clock_rcrd" "vctrs_rcrd"
#> [5] "vctrs_vctr"
#>
#> $us
#> [1] "clock_naive_time" "clock_time_point" "clock_rcrd" "vctrs_rcrd"
#> [5] "vctrs_vctr"
#>
#> $ms
#> [1] "clock_naive_time" "clock_time_point" "clock_rcrd" "vctrs_rcrd"
#> [5] "vctrs_vctr"
#>
#> $s
#> [1] "clock_naive_time" "clock_time_point" "clock_rcrd" "vctrs_rcrd"
#> [5] "vctrs_vctr"
#>
#> $d
#> [1] "clock_naive_time" "clock_time_point" "clock_rcrd" "vctrs_rcrd"
#> [5] "vctrs_vctr"
Created on 2023-12-12 with reprex v2.0.2
Correct example (in the example above, second is typod to nanosecond)
char_times = c(
"0001-01-01 01:01:01.000000001",
"2212-01-01 12:34:57.123456789",
"3712-01-01 12:34:56.123456789"
)
fmt = "%Y-%m-%d %H:%M:%OS"
clock_times = list(
ns = clock::naive_time_parse(char_times , format = fmt, precision = "nanosecond"),
us = clock::naive_time_parse(char_times , format = fmt, precision = "microsecond"),
ms = clock::naive_time_parse(char_times , format = fmt, precision = "millisecond"),
s = clock::naive_time_parse(char_times , format = fmt, precision = "second"),
d = clock::naive_time_parse(char_times , format = fmt, precision = "day")
)
clock_times
#> $ns
#> <naive_time<nanosecond>[3]>
#> [1] "1754-08-30T23:44:42.128654849" "2212-01-01T12:34:57.123456789"
#> [3] "1958-05-04T13:51:14.994801941"
#>
#> $us
#> <naive_time<microsecond>[3]>
#> [1] "0001-01-01T01:01:01.000000" "2212-01-01T12:34:57.123456"
#> [3] "3712-01-01T12:34:56.123456"
#>
#> $ms
#> <naive_time<millisecond>[3]>
#> [1] "0001-01-01T01:01:01.000" "2212-01-01T12:34:57.123"
#> [3] "3712-01-01T12:34:56.123"
#>
#> $s
#> <naive_time<second>[3]>
#> [1] "0001-01-01T01:01:01" "2212-01-01T12:34:57" "3712-01-01T12:34:56"
#>
#> $d
#> <naive_time<day>[3]>
#> [1] "0001-01-01" "2212-01-02" "3712-01-02"
lapply(clock_times,\(x) unclass(x) |> str()) |> invisible()
#> List of 2
#> $ lower: num [1:3] 5.65e+08 3.93e+09 2.06e+09
#> $ upper: num [1:3] 2.71e+09 2.67e+09 1.72e+09
#> - attr(*, "precision")= int 10
#> - attr(*, "clock")= int 1
#> List of 2
#> $ lower: num [1:3] 2.13e+09 2.15e+09 2.16e+09
#> $ upper: num [1:3] 3.67e+09 3.11e+09 3.96e+09
#> - attr(*, "precision")= int 9
#> - attr(*, "clock")= int 1
#> List of 2
#> $ lower: num [1:3] 2.15e+09 2.15e+09 2.15e+09
#> $ upper: num [1:3] 3.99e+09 3.17e+08 9.32e+08
#> - attr(*, "precision")= int 8
#> - attr(*, "clock")= int 1
#> List of 2
#> $ lower: num [1:3] 2.15e+09 2.15e+09 2.15e+09
#> $ upper: num [1:3] 2.29e+09 3.34e+09 3.43e+09
#> - attr(*, "precision")= int 7
#> - attr(*, "clock")= int 1
#> List of 2
#> $ lower: num [1:3] 2.15e+09 2.15e+09 2.15e+09
#> $ upper: num [1:3] 4.29e+09 8.84e+04 6.36e+05
#> - attr(*, "precision")= int 4
#> - attr(*, "clock")= int 1
Created on 2024-02-27 with reprex v2.0.2
@eitsupi can this closed too?
We need to be able to convert Polars Datetime types to clock types, just as we provide multiple ways to convert Int64. Currently no such API is provided, so we need to wait for an update on the clock side (r-lib/clock#365).
e5898b4518cc72ec20890aad0c92e64c41c5d92a supports exporting datetime as clock naive time/zoned time.
fa157e680724e8a32fbb84f4fa12c52fd92917b0 supports importing clock_time_point
as datetime in the Rust side.
It seems to be twice as fast as the current implementation, which is implemented only on the R side.
library(clock)
time_clock <- seq_len(10^5) |>
as.POSIXct(tz = "UTC") |>
as_zoned_time()
bench::mark(
main = {
polars::as_polars_series(time_clock)
},
neo = {
neopolars::as_polars_series(time_clock)
},
check = FALSE
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 main 123.8ms 125.4ms 7.97 12.89MB 7.97
#> 2 neo 59.8ms 60.4ms 15.8 5.08MB 2.26
Created on 2024-09-01 with reprex v2.1.1