parsnip icon indicating copy to clipboard operation
parsnip copied to clipboard

Model creation fails with updated XGBoost (≥ v2.1)

Open therealjpetereit opened this issue 1 year ago • 6 comments

Hi and a merry 2025 🎉🙌,

I just updated my XGBoost to 2.1.3 and started having problems building models with that.

I assume they did many changes when they updated from 2.0.3 to 2.1+ , but I tracked down what broke the code for me.

For me in particular the xgb.DMatrix function fails when I use the current Parsnip version (1.2.1) with Tidymodels (v 1.2.0)

the Reason is the update function in XGBoost:

Old

xgb.DMatrix <- function(data, info = list(), missing = NA, silent = FALSE, nthread = NULL, ...)  

New

xgb.DMatrix <- function(
  data,
  label = NULL,
  weight = NULL,
  base_margin = NULL,
  missing = NA,
  silent = FALSE,
  feature_names = colnames(data),
  feature_types = NULL,
  nthread = NULL,
  group = NULL,
  qid = NULL,
  label_lower_bound = NULL,
  label_upper_bound = NULL,
  feature_weights = NULL,
  data_split_mode = "row"
)

I am not sure what exactly was in the old info=list() but probably all these arguments which are now directly passed to the function.
This could be all part of their general R interface overhaul, but I thought I just let you know after I spent some time tracking this down.

For now I downgrade to 2.0.3 and wait until you had the time to update the functionality to match the newer XGBoost releases.

Cheers Jakob

therealjpetereit avatar Jan 08 '25 02:01 therealjpetereit

Yes this is indeed a bug. Thanks for catching it!

{parsnip} is currently compatible with {xgboost} version 1.7.8.1, which is the most recent CRAN version. This is happening because the {xgboost} R package on CRAN doesn't match the release versions.

It appears that xgboost is gearing up for another CRAN release https://github.com/dmlc/xgboost/issues/9810 so we should get ready for this.

@therealjpetereit is correct, in this PR https://github.com/dmlc/xgboost/pull/9862, they switch from having some arguments passed to xgb.DMatrix() as a named list in info, instead having all of them spelled out in full. They removed the info argument instead of deprecating it, giving us the error, because we pass things to info.

Ideally, {xgboost} would have deprecated info a little more robustly, so I think we need to so some switching on {xgboost} versions or updating {parsnip} once {xgboost} is out.

CRAN versions

library(parsnip)

xgb_spec <- boost_tree() |>
  set_mode("regression") |>
  set_engine("xgboost")

xgb_spec |>
  fit(mpg ~ ., data = mtcars)
#> parsnip model object
#> 
#> ##### xgb.Booster
#> raw: 21.6 Kb 
#> call:
#>   xgboost::xgb.train(params = list(eta = 0.3, max_depth = 6, gamma = 0, 
#>     colsample_bytree = 1, colsample_bynode = 1, min_child_weight = 1, 
#>     subsample = 1), data = x$data, nrounds = 15, watchlist = x$watchlist, 
#>     verbose = 0, nthread = 1, objective = "reg:squarederror")
#> params (as set within xgb.train):
#>   eta = "0.3", max_depth = "6", gamma = "0", colsample_bytree = "1", colsample_bynode = "1", min_child_weight = "1", subsample = "1", nthread = "1", objective = "reg:squarederror", validate_parameters = "TRUE"
#> xgb.attributes:
#>   niter
#> callbacks:
#>   cb.evaluation.log()
#> # of features: 10 
#> niter: 15
#> nfeatures : 10 
#> evaluation_log:
#>   iter training_rmse
#>  <num>         <num>
#>      1    14.9313149
#>      2    10.9568064
#>    ---           ---
#>     14     0.5628964
#>     15     0.4603055

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.2 (2024-10-31)
#>  os       macOS Sequoia 15.2
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Los_Angeles
#>  date     2025-01-07
#>  pandoc   3.6.1 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.3   2024-06-21 [1] CRAN (R 4.4.0)
#>  colorspace    2.1-1   2024-07-26 [1] CRAN (R 4.4.0)
#>  data.table    1.16.4  2024-12-06 [1] CRAN (R 4.4.1)
#>  digest        0.6.37  2024-08-19 [1] CRAN (R 4.4.1)
#>  dplyr         1.1.4   2023-11-17 [1] CRAN (R 4.4.0)
#>  evaluate      1.0.1   2024-10-10 [1] CRAN (R 4.4.1)
#>  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.4.0)
#>  fs            1.6.5   2024-10-30 [1] CRAN (R 4.4.1)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.4.0)
#>  ggplot2       3.5.1   2024-04-23 [1] CRAN (R 4.4.0)
#>  glue          1.8.0   2024-09-30 [1] CRAN (R 4.4.1)
#>  gtable        0.3.6   2024-10-25 [1] CRAN (R 4.4.1)
#>  hardhat       1.4.0   2024-06-02 [1] CRAN (R 4.4.0)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
#>  jsonlite      1.8.9   2024-09-20 [1] CRAN (R 4.4.1)
#>  knitr         1.49    2024-11-08 [1] CRAN (R 4.4.1)
#>  lattice       0.22-6  2024-03-20 [2] CRAN (R 4.4.2)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.4.0)
#>  Matrix        1.7-1   2024-10-18 [2] CRAN (R 4.4.2)
#>  munsell       0.5.1   2024-04-01 [1] CRAN (R 4.4.0)
#>  parsnip     * 1.2.1   2024-03-22 [1] CRAN (R 4.4.0)
#>  pillar        1.10.0  2024-12-17 [1] CRAN (R 4.4.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.4.0)
#>  purrr         1.0.2   2023-08-10 [1] CRAN (R 4.4.0)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.4.0)
#>  reprex        2.1.1   2024-07-06 [1] CRAN (R 4.4.0)
#>  rlang         1.1.4   2024-06-04 [1] CRAN (R 4.4.0)
#>  rmarkdown     2.29    2024-11-04 [1] CRAN (R 4.4.1)
#>  scales        1.3.0   2023-11-28 [1] CRAN (R 4.4.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.4.0)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
#>  tidyr         1.3.1   2024-01-24 [1] CRAN (R 4.4.0)
#>  tidyselect    1.2.1   2024-03-11 [1] CRAN (R 4.4.0)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
#>  withr         3.0.2   2024-10-28 [1] CRAN (R 4.4.1)
#>  xfun          0.49    2024-10-31 [1] CRAN (R 4.4.1)
#>  xgboost       1.7.8.1 2024-07-24 [1] CRAN (R 4.4.0)
#>  yaml          2.3.10  2024-07-26 [1] CRAN (R 4.4.0)
#> 
#>  [1] /Users/emilhvitfeldt/Library/R/arm64/4.4/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2025-01-07 with reprex v2.1.1

EmilHvitfeldt avatar Jan 08 '25 02:01 EmilHvitfeldt

Yes,

I have been forced to move to GitHub versions of XGBoost for GPU integrations and even AMD GPU versions lately 😅 And who can resist installing the latest versions 😂😂

Cheers J

therealjpetereit avatar Jan 08 '25 03:01 therealjpetereit

Noting that this is likely related to / duplicate of https://github.com/tidymodels/parsnip/issues/1087. :)

simonpcouch avatar Jan 09 '25 17:01 simonpcouch

appears to be separate issue. but will hit us at the same time when they release to CRAN :)

EmilHvitfeldt avatar Jan 09 '25 20:01 EmilHvitfeldt

Noting a tag in https://github.com/dmlc/xgboost/issues/9810#issuecomment-2599763233 that has some higher-level information.

simonpcouch avatar Jan 21 '25 19:01 simonpcouch

For anyone trying to run parsnip with xgboost on the GPU: the combination parnsip 1.3.1 and xgboost 2.0.3 works.

  1. Download xgboost_r_gpu from xgboost reases page
  2. Then install in R:
install.packages("xgboost_r_gpu_linux_82d846bbeb83c652a0b1dff0e3519e67569c4a3d.tar.gz", repos = NULL, type = "source")

sjdh avatar May 15 '25 08:05 sjdh

Hi all,

I believe XGBoost on GitHub is now on version 3.03 - yet the latest CRAN version is 1.7.11.1 The version disparity seems to become rather large =(

Do you plan to adjust the way parsnip ineteracts with XGBoost based on the newer releases or based on CRAN ? I am not sure what the best solution is either. XGBoost until v 2.03 still works with the current framework.

Cheers J

therealjpetereit avatar Aug 11 '25 02:08 therealjpetereit

One of the problems is that they have been taking a while to get the CRAN package up to speed with the other versions they have.

At a glance, it appears like they are keeping it alive and working on CRAN without moving over the new features, hence why there have been recent releases.

For our notes: https://xgboost.readthedocs.io/en/stable/R-package/index.html

Since XGBoost 3.0.0, the latest R package is available on R-universe while the one on CRAN is kept at an older version. We will work on helping the CRAN version to catch up in the future. In the meantime, please use R-universe packages.

And they say to install from r-universe which is currently using version 3.0.4.1 https://dmlc.r-universe.dev/xgboost

Which tripped me up because the DESCRIPTION notes 3.1.0.0. All in all they just a different versioning system then what we use which is fine.

In the current state I imagine it would take some time for them to update CRAN all the way as the newer versions produces bugs because of the updates they made. Without a deprecation period this will be harder on CRAN.

That being said, we could do some switching based on version numbers as mentioned earlier

EmilHvitfeldt avatar Aug 11 '25 17:08 EmilHvitfeldt

Hi,

I guess version switching might be the way to go. I think essential functions are all the same apart from the input for the xgb DMatrix creation.

Let me know if and when you get a chance to implement that. Not urgent.

Cheers J

therealjpetereit avatar Sep 09 '25 04:09 therealjpetereit

downgrading from xgboost_3.1.1.1 (GPU) with parsnip_1.3.3 to xgboost_2.0.3.1 fixed my problem. Just needed to add the device to the function set_engine("xgboost", device = "cuda"). Thx @sjdh!

LucasMS avatar Oct 24 '25 13:10 LucasMS