purrr
purrr copied to clipboard
accumulate with Date data
accumulate
does not preserve Date class. I would have expected that carried
would be of Date class as the anonymous function returns Date class objects but carried
in out is numeric.
# simplified from
# https://stackoverflow.com/questions/70021943/r-recursive-lag-for-previous-dependent-value/70023303#70023303
library(dplyr)
library(purrr)
p <- structure(list(CCDATE = structure(c(18716, 18715, 18713, 18712,
8895, 18498), class = "Date"), MCCDATE = structure(c(18686, 18686,
18685, 18684, 18864, 18467), class = "Date")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
str(p)
tibble [6 x 2] (S3: tbl_df/tbl/data.frame)
## $ CCDATE : Date[1:6], format: "2021-03-30" "2021-03-29" ...
## $ MCCDATE: Date[1:6], format: "2021-02-28" "2021-02-28" ...
out <- p %>%
mutate(carried = accumulate(2:n(), .init = first(MCCDATE),
function(carried, i) if (CCDATE[i-1] <= carried) MCCDATE[i-1] else carried))
str(out)
## tibble [6 x 3] (S3: tbl_df/tbl/data.frame)
## $ CCDATE : Date[1:6], format: "2021-03-30" "2021-03-29" ...
## $ MCCDATE: Date[1:6], format: "2021-02-28" "2021-02-28" ...
## $ carried: num [1:6] 18686 18686 18686 18686 18686 ... <---------------------------------------
I encountered this issue as well and can confirm it. In fact, it is not just the class attribute that is affected, but the returned vector has no attributes at all. Thus, with a date
variable you can set the class and keep going. With the interval
class from lubridate
data from the attributes is lost, rendering the returned vector useless. purrr::reduce()
on the other hand yields the correct result.
This behavior is exactly the same for base::Reduce()
(accumulate = T
returns a vector without attributes, while accumulate = F
works as expected). I have not looked at the code, but from that I guess the purrr
functions might be just a wrapper around the base function?
In the following reprex I simplified the problem a bit further. With \(x,y) y
as the .f
-argument, the returned vectors should be identical to the original.
library(tidyverse)
library(lubridate)
d <- tibble(
num = 1:10,
fact = factor(letters)[1:10],
dat = seq(ymd("2022-01-01"), ymd("2022-01-10"), length.out = 10),
dttm = dat + hours(1),
intv = dat %--% (dat + 1)
)
d
#> # A tibble: 10 × 5
#> num fact dat dttm intv
#> <int> <fct> <date> <dttm> <Interval>
#> 1 1 a 2022-01-01 2022-01-01 01:00:00 2022-01-01 UTC--2022-01-02 UTC
#> 2 2 b 2022-01-02 2022-01-02 01:00:00 2022-01-02 UTC--2022-01-03 UTC
#> 3 3 c 2022-01-03 2022-01-03 01:00:00 2022-01-03 UTC--2022-01-04 UTC
#> 4 4 d 2022-01-04 2022-01-04 01:00:00 2022-01-04 UTC--2022-01-05 UTC
#> 5 5 e 2022-01-05 2022-01-05 01:00:00 2022-01-05 UTC--2022-01-06 UTC
#> 6 6 f 2022-01-06 2022-01-06 01:00:00 2022-01-06 UTC--2022-01-07 UTC
#> 7 7 g 2022-01-07 2022-01-07 01:00:00 2022-01-07 UTC--2022-01-08 UTC
#> 8 8 h 2022-01-08 2022-01-08 01:00:00 2022-01-08 UTC--2022-01-09 UTC
#> 9 9 i 2022-01-09 2022-01-09 01:00:00 2022-01-09 UTC--2022-01-10 UTC
#> 10 10 j 2022-01-10 2022-01-10 01:00:00 2022-01-10 UTC--2022-01-11 UTC
d2 <- d |>
map_dfc(accumulate, \(x,y) y)
d2
#> # A tibble: 10 × 5
#> num fact dat dttm intv
#> <int> <fct> <int> <dbl> <dbl>
#> 1 1 a 18993 1640998800 86400
#> 2 2 b 18994 1641085200 86400
#> 3 3 c 18995 1641171600 86400
#> 4 4 d 18996 1641258000 86400
#> 5 5 e 18997 1641344400 86400
#> 6 6 f 18998 1641430800 86400
#> 7 7 g 18999 1641517200 86400
#> 8 8 h 19000 1641603600 86400
#> 9 9 i 19001 1641690000 86400
#> 10 10 j 19002 1641776400 86400
Interestingly, this works for factors, which is also an S3 object, but not for the date or datetime, which are also S3, nor for the interval which is an S4 object.
This seems already fixed in the dev version.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
p <- structure(list(CCDATE = structure(c(18716, 18715, 18713, 18712,
8895, 18498), class = "Date"), MCCDATE = structure(c(18686, 18686,
18685, 18684, 18864, 18467), class = "Date")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
str(p)
#> tibble [6 × 2] (S3: tbl_df/tbl/data.frame)
#> $ CCDATE : Date[1:6], format: "2021-03-30" "2021-03-29" ...
#> $ MCCDATE: Date[1:6], format: "2021-02-28" "2021-02-28" ...
out <- p %>%
mutate(carried = accumulate(2:n(), .init = first(MCCDATE),
function(carried, i) if (CCDATE[i-1] <= carried) MCCDATE[i-1] else carried))
str(out)
#> tibble [6 × 3] (S3: tbl_df/tbl/data.frame)
#> $ CCDATE : Date[1:6], format: "2021-03-30" "2021-03-29" ...
#> $ MCCDATE: Date[1:6], format: "2021-02-28" "2021-02-28" ...
#> $ carried: Date[1:6], format: "2021-02-28" "2021-02-28" ...
Created on 2022-03-31 by the reprex package (v2.0.1)
This still looks broken to me:
library(purrr)
x <- as.Date("2020-01-01") + 1:10
accumulate(x, ~ .y)
#> [1] 18263 18264 18265 18266 18267 18268 18269 18270 18271 18272
Created on 2022-08-23 by the reprex package (v2.0.1)
Choosing to fix this will be a relatively big change because it implies we'll need to switch from base R semantics to vctrs semantics. It's probably worth it but there's some chance of breaking existing code.
Really? It still works for me. As I wrote above, we need to use the dev version of purrr (another chance is that the locale affects this...?)
library(purrr)
x <- as.Date("2020-01-01") + 1:10
accumulate(x, ~ .y)
#> [1] "2020-01-02" "2020-01-03" "2020-01-04" "2020-01-05" "2020-01-06"
#> [6] "2020-01-07" "2020-01-08" "2020-01-09" "2020-01-10" "2020-01-11"
Created on 2022-08-23 with reprex v2.0.2
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.1 (2022-06-23 ucrt)
#> os Windows 10 x64 (build 22000)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate Japanese_Japan.utf8
#> ctype Japanese_Japan.utf8
#> tz Asia/Tokyo
#> date 2022-08-23
#> pandoc 2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0)
#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0)
#> evaluate 0.16 2022-08-09 [1] CRAN (R 4.2.1)
#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)
#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0)
#> htmltools 0.5.3 2022-07-18 [1] CRAN (R 4.2.1)
#> knitr 1.39 2022-04-26 [1] CRAN (R 4.2.0)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
#> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
#> purrr * 0.3.4.9000 2022-08-23 [1] Github (tidyverse/purrr@c2efde2)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.1)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0)
#> R.utils 2.12.0 2022-06-28 [1] CRAN (R 4.2.0)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.1)
#> rlang 1.0.4 2022-07-12 [1] CRAN (R 4.2.1)
#> rmarkdown 2.15 2022-08-16 [1] CRAN (R 4.2.1)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
#> stringi 1.7.8 2022-07-11 [1] CRAN (R 4.2.1)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)
#> styler 1.7.0 2022-03-13 [1] CRAN (R 4.2.0)
#> tibble 3.1.8 2022-07-22 [1] CRAN (R 4.2.1)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)
#> vctrs 0.4.1.9000 2022-08-23 [1] Github (r-lib/vctrs@64d442f)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
#> xfun 0.32 2022-08-10 [1] CRAN (R 4.2.1)
#> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0)
#>
#> [1] C:/Users/Yutani/AppData/Local/R/win-library/4.2
#> [2] C:/Program Files/R/R-4.2.1/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Ooops, I think I forgot to pull 😳