gt
gt copied to clipboard
fmt_date (and vec_fmt_date) are especially slow (bigD::fdt is not vectorised)
Prework
- [x] Read and abide by gt's code of conduct and contributing guidelines.
- [x] Search for duplicates among the existing issues (both open and closed).
Proposal
fmt_date
is very slow when applied to columns of any length (e.g. see example of 100 cells below)
Possible proposals:
- change bigD so that it doesn't iterate over the input vector(s) - see loop on https://github.com/rstudio/bigD/blob/8fe7727a0c394bd6480c48072f104d14c91058c9/R/fdt.R#L963
- this might also require changing fmt_date/bigD to allow timezone to be specified at a vector level?
- change bigD so it doesn't perform so many unnecessary formatting operations
- e.g. when
y
is present then it seems like there are 8 calls toug_sub
which in turn means 8 calls todt_y*
functions plus (on my Windows machine at least) lots of unicode encoding operations- the
{y}
calls https://github.com/rstudio/bigD/blob/8fe7727a0c394bd6480c48072f104d14c91058c9/R/fdt.R#L1171C4-L1181C6 - the
ug_sub
implementation - https://github.com/rstudio/bigD/blob/8fe7727a0c394bd6480c48072f104d14c91058c9/R/fdt.R#L1539C1-L1554C2
- the
- e.g. when
- provide an alternative fmt_date function to allow gt users to switch date formatting away from bigD
- provide a performance warning for anyone using fmt_date on more than a few columns/rows
Possibly might also need to look at fmt_datetime
as that uses bigD for some formatting (but not all)
Example
devtools::load_all()
library(tidyverse)
library(tictoc)
library(gt)
demo_length <- 100
my_table <-
tibble(
Name = # generate random names
stringi::stri_rand_strings(demo_length, 10),
Population = round(runif(demo_length, 100000000, 1000000000)),
Score = runif(demo_length, 0, 1),
Date = lubridate::now() + runif(demo_length, -1000, 1000)
)
tictoc::tic("fmt_")
html1 <- my_table |>
gt() |>
fmt_number(columns = "Population") |>
fmt_date(columns = "Date", date_style = "m_day_year" ) |>
gt:::as.tags.gt_tbl()
tictoc::toc()
tictoc::tic("vec_fmt_")
html2 <- my_table |>
mutate(Date = vec_fmt_date(Date, date_style = "m_day_year")) |>
gt() |>
fmt_number(columns = "Population") |>
gt:::as.tags.gt_tbl()
tictoc::toc()
tictoc::tic("mutate")
html3 <- my_table |>
mutate(Date = format(Date, "%b %d, %y")) |>
gt() |>
fmt_number(columns = "Population") |>
gt:::as.tags.gt_tbl()
tictoc::toc()
This formats only 100 date cells and gives output:
fmt_: 1.81 sec elapsed
vec_fmt_: 1.67 sec elapsed
mutate: 0.08 sec elapsed
profvis shows bigD::fdt
dominating the time spent in the slower cases:
Note: This situation is even worse in interactive where #1528 multiplies the problem