fabletools icon indicating copy to clipboard operation
fabletools copied to clipboard

Mutate() issue - Hierarchical Forecasting

Open edoardobassett opened this issue 4 years ago • 6 comments

Hi, I am reposting this issue on GitHub, with a more complete example, as I suspect it might not be related to the data being used or code mistakes.

I am trying to perform Hierarchical Forecasting on a dataset that is fundamentally structured in the same way as the tourism tsibble referenced in Forecasting: Principles and Practice, but with more hierarchical levels. However, after the structural aggregation, a mutate() error shows up. The data doesn't contain any missing values.

Following, you will find a reprex of the code, containing a minimal version of the data used that is able to reproduce the error.

Thanks in advance.

library(fable)
library(dplyr)
library(tsibble)
library(tidyverse)

t_london <- tibble::tribble(
  ~Month,             ~Value.type,   ~LSOA11CD,             ~LSOA11NM,     ~WD19CD,      ~WD19NM,    ~LAD19CD,         ~LAD19NM,           ~CTYNM, ~RGN19NM, ~CNTY21NM, ~NTN21NM, ~Count,
  "2010 Dec", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Jan", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Feb", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     3L,
  "2011 Mar", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Apr", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2011 May", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Jun", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     4L,
  "2011 Jul", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     3L,
  "2011 Aug", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Sep", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2011 Oct", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     1L,
  "2011 Nov", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     1L,
  "2011 Dec", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     6L
)

t_london <- t_london  %>%
mutate(Month = yearmonth(Month)) %>%
  as_tsibble(key = c(LSOA11CD, Value.type), index=Month)

london_full <- t_london %>% aggregate_key((NTN21NM/ CNTY21NM / RGN19NM / CTYNM / LAD19NM / WD19NM /LSOA11NM) * Value.type, Total = sum(Count))

fit <- london_full %>%
  model(base = ARIMA(Total)) %>%
  reconcile(
    bu = bottom_up(base),
    ols = min_trace(base, method = "ols"),
    mint = min_trace(base, method = "mint_shrink"),
  )
#> Warning in max(which(abs(ma) > 1e-08)): no non-missing arguments to max;
#> returning -Inf

#> Warning: 16 errors (1 unique) encountered for base
#> [16] argument must be coercible to non-negative integer

fc <- fit %>%
  forecast(h = 5)
#> Warning: Problem with `mutate()` input `mint`.
#> ℹ diag(.) had 0 or NA entries; non-finite result is doubtful
#> ℹ Input `mint` is `(function (object, ...) ...`.
#> Warning: Problem with `mutate()` input `mint`.
#> ℹ diag(.) had 0 or NA entries; non-finite result is doubtful
#> ℹ Input `mint` is `(function (object, ...) ...`.
#> Error: Problem with `mutate()` input `mint`.
#> x infinite or missing values in 'x'
#> ℹ Input `mint` is `(function (object, ...) ...`.

Created on 2021-02-10 by the reprex package (v0.3.0)

edoardobassett avatar Feb 10 '21 21:02 edoardobassett

I am unable to reproduce this issue with the latest versions of the packages. Perhaps try updating to the latest CRAN releases?

library(fable)
#> Loading required package: fabletools
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union
library(tidyverse)

t_london <- tibble::tribble(
  ~Month,             ~Value.type,   ~LSOA11CD,             ~LSOA11NM,     ~WD19CD,      ~WD19NM,    ~LAD19CD,         ~LAD19NM,           ~CTYNM, ~RGN19NM, ~CNTY21NM, ~NTN21NM, ~Count,
  "2010 Dec", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Jan", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Feb", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     3L,
  "2011 Mar", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Apr", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2011 May", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Jun", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     4L,
  "2011 Jul", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     3L,
  "2011 Aug", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Sep", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2011 Oct", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     1L,
  "2011 Nov", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     1L,
  "2011 Dec", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     6L
)

t_london <- t_london  %>%
  mutate(Month = yearmonth(Month)) %>%
  as_tsibble(key = c(LSOA11CD, Value.type), index=Month)

london_full <- t_london %>% aggregate_key((NTN21NM/ CNTY21NM / RGN19NM / CTYNM / LAD19NM / WD19NM /LSOA11NM) * Value.type, Total = sum(Count))

fit <- london_full %>%
  model(base = ARIMA(Total)) %>%
  reconcile(
    bu = bottom_up(base),
    ols = min_trace(base, method = "ols"),
    mint = min_trace(base, method = "mint_shrink"),
  )

fc <- fit %>%
  forecast(h = 5)
fc
#> # A fable: 320 x 12 [1M]
#> # Key:     NTN21NM, Value.type, CNTY21NM, RGN19NM, CTYNM, LAD19NM, WD19NM,
#> #   LSOA11NM, .model [64]
#>    NTN21NM    Value.type CNTY21NM   RGN19NM    CTYNM      LAD19NM    WD19NM    
#>    <chr*>     <chr*>     <chr*>     <chr*>     <chr*>     <chr*>     <chr*>    
#>  1 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  2 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  3 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  4 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  5 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  6 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  7 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  8 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  9 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#> 10 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#> # … with 310 more rows, and 5 more variables: LSOA11NM <chr*>, .model <chr>,
#> #   Month <mth>, Total <dist>, .mean <dbl>

Created on 2021-02-11 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.2 (2020-06-22)
#>  os       Ubuntu 20.04.1 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_AU:en                    
#>  collate  en_AU.UTF-8                 
#>  ctype    en_AU.UTF-8                 
#>  tz       Australia/Melbourne         
#>  date     2021-02-11                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package        * version    date       lib source                            
#>  anytime          0.3.9      2020-08-27 [1] CRAN (R 4.0.2)                    
#>  assertthat       0.2.1      2019-03-21 [1] CRAN (R 4.0.2)                    
#>  backports        1.2.1      2020-12-09 [1] CRAN (R 4.0.2)                    
#>  blob             1.2.1      2020-01-20 [1] CRAN (R 4.0.2)                    
#>  broom            0.7.0      2020-07-09 [1] CRAN (R 4.0.2)                    
#>  callr            3.5.1      2020-10-13 [1] CRAN (R 4.0.2)                    
#>  cellranger       1.1.0      2016-07-27 [1] CRAN (R 4.0.2)                    
#>  cli              2.3.0      2021-01-31 [1] CRAN (R 4.0.2)                    
#>  colorspace       2.0-0      2020-11-11 [1] CRAN (R 4.0.2)                    
#>  crayon           1.4.0      2021-01-30 [1] CRAN (R 4.0.2)                    
#>  DBI              1.1.0      2019-12-15 [1] CRAN (R 4.0.2)                    
#>  dbplyr           1.4.4      2020-05-27 [1] CRAN (R 4.0.2)                    
#>  desc             1.2.0      2018-05-01 [1] CRAN (R 4.0.2)                    
#>  devtools         2.3.2      2020-09-18 [1] CRAN (R 4.0.2)                    
#>  digest           0.6.27     2020-10-24 [1] CRAN (R 4.0.2)                    
#>  distributional   0.2.1      2020-10-06 [1] CRAN (R 4.0.2)                    
#>  dplyr          * 1.0.4      2021-02-02 [1] CRAN (R 4.0.2)                    
#>  ellipsis         0.3.1      2020-05-15 [1] CRAN (R 4.0.2)                    
#>  evaluate         0.14       2019-05-28 [1] CRAN (R 4.0.2)                    
#>  fable          * 0.3.0      2021-02-02 [1] local                             
#>  fabletools     * 0.3.0.9000 2021-02-02 [1] local                             
#>  fansi            0.4.2      2021-01-15 [1] CRAN (R 4.0.2)                    
#>  farver           2.0.3      2020-01-16 [1] CRAN (R 4.0.2)                    
#>  feasts           0.1.7      2021-02-08 [1] local                             
#>  forcats        * 0.5.1      2021-01-27 [1] CRAN (R 4.0.2)                    
#>  fs               1.5.0      2020-07-31 [1] CRAN (R 4.0.2)                    
#>  generics         0.1.0      2020-10-31 [1] CRAN (R 4.0.2)                    
#>  ggplot2        * 3.3.3      2020-12-30 [1] CRAN (R 4.0.2)                    
#>  glue             1.4.2      2020-08-27 [1] CRAN (R 4.0.2)                    
#>  gtable           0.3.0      2019-03-25 [1] CRAN (R 4.0.2)                    
#>  haven            2.3.1      2020-06-01 [1] CRAN (R 4.0.2)                    
#>  highr            0.8        2019-03-20 [1] CRAN (R 4.0.2)                    
#>  hms              1.0.0      2021-01-13 [1] CRAN (R 4.0.2)                    
#>  htmltools        0.5.1      2021-01-12 [1] CRAN (R 4.0.2)                    
#>  httr             1.4.2      2020-07-20 [1] CRAN (R 4.0.2)                    
#>  jsonlite         1.7.2      2020-12-09 [1] CRAN (R 4.0.2)                    
#>  knitr            1.30       2020-09-22 [1] CRAN (R 4.0.2)                    
#>  lattice          0.20-41    2020-04-02 [2] CRAN (R 4.0.2)                    
#>  lifecycle        0.2.0      2020-03-06 [1] CRAN (R 4.0.2)                    
#>  lubridate        1.7.9.2    2020-11-13 [1] CRAN (R 4.0.2)                    
#>  magrittr         2.0.1      2020-11-17 [1] CRAN (R 4.0.2)                    
#>  Matrix           1.2-18     2019-11-27 [2] CRAN (R 4.0.2)                    
#>  memoise          1.1.0      2017-04-21 [1] CRAN (R 4.0.2)                    
#>  modelr           0.1.8      2020-05-19 [1] CRAN (R 4.0.2)                    
#>  munsell          0.5.0      2018-06-12 [1] CRAN (R 4.0.2)                    
#>  nlme             3.1-148    2020-05-24 [2] CRAN (R 4.0.2)                    
#>  pillar           1.4.7      2020-11-20 [1] CRAN (R 4.0.2)                    
#>  pkgbuild         1.2.0      2020-12-15 [1] CRAN (R 4.0.2)                    
#>  pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.0.2)                    
#>  pkgload          1.1.0      2020-05-29 [1] CRAN (R 4.0.2)                    
#>  prettyunits      1.1.1      2020-01-24 [1] CRAN (R 4.0.2)                    
#>  processx         3.4.5      2020-11-30 [1] CRAN (R 4.0.2)                    
#>  progressr        0.7.0      2020-12-11 [1] CRAN (R 4.0.2)                    
#>  ps               1.5.0      2020-12-05 [1] CRAN (R 4.0.2)                    
#>  purrr          * 0.3.4      2020-04-17 [1] CRAN (R 4.0.2)                    
#>  R6               2.5.0      2020-10-28 [1] CRAN (R 4.0.2)                    
#>  Rcpp             1.0.6      2021-01-15 [1] CRAN (R 4.0.2)                    
#>  readr          * 1.4.0      2020-10-05 [1] CRAN (R 4.0.2)                    
#>  readxl           1.3.1      2019-03-13 [1] CRAN (R 4.0.2)                    
#>  remotes          2.2.0      2020-07-21 [1] CRAN (R 4.0.2)                    
#>  reprex           0.3.0      2019-05-16 [1] CRAN (R 4.0.2)                    
#>  rlang            0.4.10     2020-12-30 [1] CRAN (R 4.0.2)                    
#>  rmarkdown        2.6        2020-12-14 [1] CRAN (R 4.0.2)                    
#>  rprojroot        2.0.2      2020-11-15 [1] CRAN (R 4.0.2)                    
#>  rvest            0.3.6      2020-07-25 [1] CRAN (R 4.0.2)                    
#>  scales           1.1.1      2020-05-11 [1] CRAN (R 4.0.2)                    
#>  sessioninfo      1.1.1      2018-11-05 [1] CRAN (R 4.0.2)                    
#>  stringi          1.5.3      2020-09-09 [1] CRAN (R 4.0.2)                    
#>  stringr        * 1.4.0      2019-02-10 [1] CRAN (R 4.0.2)                    
#>  testthat         3.0.1      2020-12-17 [1] CRAN (R 4.0.2)                    
#>  tibble         * 3.0.6      2021-01-29 [1] CRAN (R 4.0.2)                    
#>  tidyr          * 1.1.2      2020-08-27 [1] CRAN (R 4.0.2)                    
#>  tidyselect       1.1.0      2020-05-11 [1] CRAN (R 4.0.2)                    
#>  tidyverse      * 1.3.0      2019-11-21 [1] CRAN (R 4.0.2)                    
#>  tsibble        * 1.0.0      2021-02-05 [1] Github (tidyverts/tsibble@722cc86)
#>  urca             1.3-0      2016-09-06 [1] CRAN (R 4.0.2)                    
#>  usethis          1.6.3      2020-09-17 [1] CRAN (R 4.0.2)                    
#>  utf8             1.1.4      2018-05-24 [1] CRAN (R 4.0.2)                    
#>  vctrs            0.3.6      2020-12-17 [1] CRAN (R 4.0.2)                    
#>  withr            2.4.1      2021-01-26 [1] CRAN (R 4.0.2)                    
#>  xfun             0.20       2021-01-06 [1] CRAN (R 4.0.2)                    
#>  xml2             1.3.2      2020-04-23 [1] CRAN (R 4.0.2)                    
#>  yaml             2.2.1      2020-02-01 [1] CRAN (R 4.0.2)                    
#> 
#> [1] /home/mitchell/R/x86_64-pc-linux-gnu-library/4.0
#> [2] /opt/R/4.0.0/lib/R/library

mitchelloharawild avatar Feb 11 '21 07:02 mitchelloharawild

Great, the issue is solved with the latest version of the packages. Thank you very much!

edoardobassett avatar Feb 11 '21 11:02 edoardobassett

The problem seemed to re-appear, when using the whole dataset. I tried capturing some of the rows that seem to be part of the issue, which you will find in the new reprex. All the packages used are the latest version.

library(fable)
#> Loading required package: fabletools
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tsibble)
library(tidyverse)

t_london <- tibble::tribble(
  ~Month,   ~Value.type,             ~LSOA11NM,     ~WD19CD,             ~WD19NM,         ~LAD19NM,           ~CTYNM, ~RGN19NM, ~CNTY21NM, ~NTN21NM, ~Count,
  "2016 Dec", "Value-Type2", "City of London 001A", "E05009288",        "Aldersgate", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2017 Jan", "Value-Type2", "City of London 001A", "E05009288",        "Aldersgate", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2016 Dec", "Value-Type2", "City of London 001B", "E05009302",       "Cripplegate", "City of London", "City Of London", "London", "England",     "UK",     1L,
  "2017 Jan", "Value-Type2", "City of London 001B", "E05009302",       "Cripplegate", "City of London", "City Of London", "London", "England",     "UK",     1L,
  "2016 Dec", "Value-Type2", "City of London 001C", "E05009302",       "Cripplegate", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2017 Jan", "Value-Type2", "City of London 001C", "E05009302",       "Cripplegate", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2016 Dec", "Value-Type2", "City of London 001E", "E05009308",         "Portsoken", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2017 Jan", "Value-Type2", "City of London 001E", "E05009308",         "Portsoken", "City of London", "City Of London", "London", "England",     "UK",     1L,
  "2016 Dec", "Value-Type2", "City of London 001F", "E05009311",            "Vintry", "City of London", "City Of London", "London", "England",     "UK",    54L,
  "2017 Jan", "Value-Type2", "City of London 001F", "E05009311",            "Vintry", "City of London", "City Of London", "London", "England",     "UK",    62L,
  "2016 Dec", "Value-Type2", "City of London 001G", "E05009304", "Farringdon Within", "City of London", "City Of London", "London", "England",     "UK",    12L,
  "2017 Jan", "Value-Type2", "City of London 001G", "E05009304", "Farringdon Within", "City of London", "City Of London", "London", "England",     "UK",     9L
)

t_london <- t_london  %>%
mutate(Month = yearmonth(Month)) %>%
  as_tsibble(key = c(LSOA11NM, Value.type), index=Month)

london_full <- t_london %>% aggregate_key((NTN21NM/ CNTY21NM / RGN19NM / CTYNM / LAD19NM / WD19NM /LSOA11NM) * Value.type, Total = sum(Count))

fit <- london_full %>%
  model(base = ARIMA(Total)) %>%
  reconcile(
    bu = bottom_up(base),
    ols = min_trace(base, method = "ols"),
    mint = min_trace(base, method = "mint_shrink"),
  )
#> Warning: 6 errors (1 unique) encountered for base
#> [6] missing value where TRUE/FALSE needed

fc <- fit %>%
  forecast(h = 1)
#> Warning in cov2cor(covm): diag(.) had 0 or NA entries; non-finite result is
#> doubtful
#> Warning in cov2cor(tar): diag(.) had 0 or NA entries; non-finite result is
#> doubtful
#> Error: Problem with `mutate()` input `mint`.
#> x infinite or missing values in 'x'
#> ℹ Input `mint` is `(function (object, ...) ...`.

Created on 2021-02-12 by the reprex package (v1.0.0)

edoardobassett avatar Feb 12 '21 17:02 edoardobassett

get the same issue, are there any updates for this?

wdzhy123 avatar Dec 16 '21 20:12 wdzhy123

are there any updates on this @mitchelloharawild?

thanks in advance!

slava-keshkov avatar Jul 02 '22 15:07 slava-keshkov

Hi, please provide a minimal reproducible example. I've just tried reproducing the example above, and the reason why it fails is due to ARIMA models being trained on just 2 observations per series - more data is required to produce sensible output.

mitchelloharawild avatar Jul 04 '22 03:07 mitchelloharawild