gt icon indicating copy to clipboard operation
gt copied to clipboard

fmt_date (and vec_fmt_date) are especially slow (bigD::fdt is not vectorised)

Open slodge opened this issue 1 year ago • 0 comments

Prework

Proposal

fmt_date is very slow when applied to columns of any length (e.g. see example of 100 cells below)

Possible proposals:

  • change bigD so that it doesn't iterate over the input vector(s) - see loop on https://github.com/rstudio/bigD/blob/8fe7727a0c394bd6480c48072f104d14c91058c9/R/fdt.R#L963
    • this might also require changing fmt_date/bigD to allow timezone to be specified at a vector level?
  • change bigD so it doesn't perform so many unnecessary formatting operations
    • e.g. when y is present then it seems like there are 8 calls to ug_sub which in turn means 8 calls to dt_y* functions plus (on my Windows machine at least) lots of unicode encoding operations
      • the {y} calls https://github.com/rstudio/bigD/blob/8fe7727a0c394bd6480c48072f104d14c91058c9/R/fdt.R#L1171C4-L1181C6
      • the ug_sub implementation - https://github.com/rstudio/bigD/blob/8fe7727a0c394bd6480c48072f104d14c91058c9/R/fdt.R#L1539C1-L1554C2
  • provide an alternative fmt_date function to allow gt users to switch date formatting away from bigD
  • provide a performance warning for anyone using fmt_date on more than a few columns/rows

Possibly might also need to look at fmt_datetime as that uses bigD for some formatting (but not all)

Example

devtools::load_all()
library(tidyverse)
library(tictoc)
library(gt)

demo_length <- 100
my_table <-
  tibble(
    Name = # generate random names
      stringi::stri_rand_strings(demo_length, 10),
    Population = round(runif(demo_length, 100000000, 1000000000)),
    Score = runif(demo_length, 0, 1),
    Date = lubridate::now() + runif(demo_length, -1000, 1000)
  )

tictoc::tic("fmt_")
html1 <- my_table |> 
  gt() |> 
  fmt_number(columns = "Population") |>
  fmt_date(columns = "Date", date_style = "m_day_year" ) |> 
  gt:::as.tags.gt_tbl()
tictoc::toc()

tictoc::tic("vec_fmt_")
html2 <- my_table |> 
  mutate(Date = vec_fmt_date(Date, date_style = "m_day_year")) |>
  gt() |> 
  fmt_number(columns = "Population") |>
  gt:::as.tags.gt_tbl()
tictoc::toc()

tictoc::tic("mutate")
html3 <- my_table |>
  mutate(Date = format(Date, "%b %d, %y")) |>
  gt() |> 
  fmt_number(columns = "Population") |>
  gt:::as.tags.gt_tbl()
tictoc::toc()

This formats only 100 date cells and gives output:

fmt_: 1.81 sec elapsed
vec_fmt_: 1.67 sec elapsed
mutate: 0.08 sec elapsed

profvis shows bigD::fdt dominating the time spent in the slower cases: image image

Note: This situation is even worse in interactive where #1528 multiplies the problem

slodge avatar Jan 12 '24 08:01 slodge