purrr icon indicating copy to clipboard operation
purrr copied to clipboard

accumulate with Date data

Open ggrothendieck opened this issue 3 years ago • 3 comments

accumulate does not preserve Date class. I would have expected that carried would be of Date class as the anonymous function returns Date class objects but carried in out is numeric.

# simplified from
# https://stackoverflow.com/questions/70021943/r-recursive-lag-for-previous-dependent-value/70023303#70023303

library(dplyr)
library(purrr)    

p <- structure(list(CCDATE = structure(c(18716, 18715, 18713, 18712, 
8895, 18498), class = "Date"), MCCDATE = structure(c(18686, 18686, 
18685, 18684, 18864, 18467), class = "Date")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))
 
str(p)
tibble [6 x 2] (S3: tbl_df/tbl/data.frame)
##  $ CCDATE : Date[1:6], format: "2021-03-30" "2021-03-29" ...
##  $ MCCDATE: Date[1:6], format: "2021-02-28" "2021-02-28" ...

out <- p  %>%
 mutate(carried = accumulate(2:n(), .init = first(MCCDATE),
  function(carried, i) if (CCDATE[i-1] <= carried) MCCDATE[i-1] else carried))

str(out)
## tibble [6 x 3] (S3: tbl_df/tbl/data.frame)
##  $ CCDATE : Date[1:6], format: "2021-03-30" "2021-03-29" ...
##  $ MCCDATE: Date[1:6], format: "2021-02-28" "2021-02-28" ...
##  $ carried: num [1:6] 18686 18686 18686 18686 18686 ...  <---------------------------------------

ggrothendieck avatar Nov 19 '21 16:11 ggrothendieck

I encountered this issue as well and can confirm it. In fact, it is not just the class attribute that is affected, but the returned vector has no attributes at all. Thus, with a date variable you can set the class and keep going. With the interval class from lubridate data from the attributes is lost, rendering the returned vector useless. purrr::reduce() on the other hand yields the correct result.

This behavior is exactly the same for base::Reduce() (accumulate = T returns a vector without attributes, while accumulate = F works as expected). I have not looked at the code, but from that I guess the purrr functions might be just a wrapper around the base function?

steffen-stell avatar Jan 31 '22 14:01 steffen-stell

In the following reprex I simplified the problem a bit further. With \(x,y) y as the .f-argument, the returned vectors should be identical to the original.

library(tidyverse)
library(lubridate)

d <- tibble(
  num = 1:10, 
  fact = factor(letters)[1:10],
  dat = seq(ymd("2022-01-01"), ymd("2022-01-10"), length.out = 10),
  dttm = dat + hours(1),
  intv = dat %--% (dat + 1)
)

d
#> # A tibble: 10 × 5
#>      num fact  dat        dttm                intv                          
#>    <int> <fct> <date>     <dttm>              <Interval>                    
#>  1     1 a     2022-01-01 2022-01-01 01:00:00 2022-01-01 UTC--2022-01-02 UTC
#>  2     2 b     2022-01-02 2022-01-02 01:00:00 2022-01-02 UTC--2022-01-03 UTC
#>  3     3 c     2022-01-03 2022-01-03 01:00:00 2022-01-03 UTC--2022-01-04 UTC
#>  4     4 d     2022-01-04 2022-01-04 01:00:00 2022-01-04 UTC--2022-01-05 UTC
#>  5     5 e     2022-01-05 2022-01-05 01:00:00 2022-01-05 UTC--2022-01-06 UTC
#>  6     6 f     2022-01-06 2022-01-06 01:00:00 2022-01-06 UTC--2022-01-07 UTC
#>  7     7 g     2022-01-07 2022-01-07 01:00:00 2022-01-07 UTC--2022-01-08 UTC
#>  8     8 h     2022-01-08 2022-01-08 01:00:00 2022-01-08 UTC--2022-01-09 UTC
#>  9     9 i     2022-01-09 2022-01-09 01:00:00 2022-01-09 UTC--2022-01-10 UTC
#> 10    10 j     2022-01-10 2022-01-10 01:00:00 2022-01-10 UTC--2022-01-11 UTC

d2 <- d |> 
  map_dfc(accumulate, \(x,y) y)

d2
#> # A tibble: 10 × 5
#>      num fact    dat       dttm  intv
#>    <int> <fct> <int>      <dbl> <dbl>
#>  1     1 a     18993 1640998800 86400
#>  2     2 b     18994 1641085200 86400
#>  3     3 c     18995 1641171600 86400
#>  4     4 d     18996 1641258000 86400
#>  5     5 e     18997 1641344400 86400
#>  6     6 f     18998 1641430800 86400
#>  7     7 g     18999 1641517200 86400
#>  8     8 h     19000 1641603600 86400
#>  9     9 i     19001 1641690000 86400
#> 10    10 j     19002 1641776400 86400

Interestingly, this works for factors, which is also an S3 object, but not for the date or datetime, which are also S3, nor for the interval which is an S4 object.

steffen-stell avatar Jan 31 '22 16:01 steffen-stell

This seems already fixed in the dev version.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)    

p <- structure(list(CCDATE = structure(c(18716, 18715, 18713, 18712, 
8895, 18498), class = "Date"), MCCDATE = structure(c(18686, 18686, 
18685, 18684, 18864, 18467), class = "Date")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))
 
str(p)
#> tibble [6 × 2] (S3: tbl_df/tbl/data.frame)
#>  $ CCDATE : Date[1:6], format: "2021-03-30" "2021-03-29" ...
#>  $ MCCDATE: Date[1:6], format: "2021-02-28" "2021-02-28" ...

out <- p  %>%
 mutate(carried = accumulate(2:n(), .init = first(MCCDATE),
  function(carried, i) if (CCDATE[i-1] <= carried) MCCDATE[i-1] else carried))

str(out)
#> tibble [6 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ CCDATE : Date[1:6], format: "2021-03-30" "2021-03-29" ...
#>  $ MCCDATE: Date[1:6], format: "2021-02-28" "2021-02-28" ...
#>  $ carried: Date[1:6], format: "2021-02-28" "2021-02-28" ...

Created on 2022-03-31 by the reprex package (v2.0.1)

yutannihilation avatar Mar 31 '22 11:03 yutannihilation

This still looks broken to me:

library(purrr)
x <- as.Date("2020-01-01") + 1:10
accumulate(x, ~ .y)
#>  [1] 18263 18264 18265 18266 18267 18268 18269 18270 18271 18272

Created on 2022-08-23 by the reprex package (v2.0.1)

Choosing to fix this will be a relatively big change because it implies we'll need to switch from base R semantics to vctrs semantics. It's probably worth it but there's some chance of breaking existing code.

hadley avatar Aug 23 '22 07:08 hadley

Really? It still works for me. As I wrote above, we need to use the dev version of purrr (another chance is that the locale affects this...?)

library(purrr)
x <- as.Date("2020-01-01") + 1:10
accumulate(x, ~ .y)
#>  [1] "2020-01-02" "2020-01-03" "2020-01-04" "2020-01-05" "2020-01-06"
#>  [6] "2020-01-07" "2020-01-08" "2020-01-09" "2020-01-10" "2020-01-11"

Created on 2022-08-23 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23 ucrt)
#>  os       Windows 10 x64 (build 22000)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  Japanese_Japan.utf8
#>  ctype    Japanese_Japan.utf8
#>  tz       Asia/Tokyo
#>  date     2022-08-23
#>  pandoc   2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  cli           3.3.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  digest        0.6.29     2021-12-01 [1] CRAN (R 4.2.0)
#>  evaluate      0.16       2022-08-09 [1] CRAN (R 4.2.1)
#>  fansi         1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  fs            1.5.2      2021-12-08 [1] CRAN (R 4.2.0)
#>  glue          1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  highr         0.9        2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools     0.5.3      2022-07-18 [1] CRAN (R 4.2.1)
#>  knitr         1.39       2022-04-26 [1] CRAN (R 4.2.0)
#>  lifecycle     1.0.1      2021-09-24 [1] CRAN (R 4.2.0)
#>  magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  pillar        1.8.1      2022-08-19 [1] CRAN (R 4.2.1)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  purrr       * 0.3.4.9000 2022-08-23 [1] Github (tidyverse/purrr@c2efde2)
#>  R.cache       0.16.0     2022-07-21 [1] CRAN (R 4.2.1)
#>  R.methodsS3   1.8.2      2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo          1.25.0     2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils       2.12.0     2022-06-28 [1] CRAN (R 4.2.0)
#>  reprex        2.0.2      2022-08-17 [1] CRAN (R 4.2.1)
#>  rlang         1.0.4      2022-07-12 [1] CRAN (R 4.2.1)
#>  rmarkdown     2.15       2022-08-16 [1] CRAN (R 4.2.1)
#>  rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.2.0)
#>  sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi       1.7.8      2022-07-11 [1] CRAN (R 4.2.1)
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.2.0)
#>  styler        1.7.0      2022-03-13 [1] CRAN (R 4.2.0)
#>  tibble        3.1.8      2022-07-22 [1] CRAN (R 4.2.1)
#>  utf8          1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs         0.4.1.9000 2022-08-23 [1] Github (r-lib/vctrs@64d442f)
#>  withr         2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun          0.32       2022-08-10 [1] CRAN (R 4.2.1)
#>  yaml          2.3.5      2022-02-21 [1] CRAN (R 4.2.0)
#> 
#>  [1] C:/Users/Yutani/AppData/Local/R/win-library/4.2
#>  [2] C:/Program Files/R/R-4.2.1/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

yutannihilation avatar Aug 23 '22 08:08 yutannihilation

Ooops, I think I forgot to pull 😳

hadley avatar Aug 23 '22 21:08 hadley