Model creation fails with updated XGBoost (≥ v2.1)
Hi and a merry 2025 🎉🙌,
I just updated my XGBoost to 2.1.3 and started having problems building models with that.
I assume they did many changes when they updated from 2.0.3 to 2.1+ , but I tracked down what broke the code for me.
For me in particular the xgb.DMatrix function fails when I use the current Parsnip version (1.2.1) with Tidymodels (v 1.2.0)
the Reason is the update function in XGBoost:
Old
xgb.DMatrix <- function(data, info = list(), missing = NA, silent = FALSE, nthread = NULL, ...)
New
xgb.DMatrix <- function(
data,
label = NULL,
weight = NULL,
base_margin = NULL,
missing = NA,
silent = FALSE,
feature_names = colnames(data),
feature_types = NULL,
nthread = NULL,
group = NULL,
qid = NULL,
label_lower_bound = NULL,
label_upper_bound = NULL,
feature_weights = NULL,
data_split_mode = "row"
)
I am not sure what exactly was in the old info=list() but probably all these arguments which are now directly passed to the function.
This could be all part of their general R interface overhaul, but I thought I just let you know after I spent some time tracking this down.
For now I downgrade to 2.0.3 and wait until you had the time to update the functionality to match the newer XGBoost releases.
Cheers Jakob
Yes this is indeed a bug. Thanks for catching it!
{parsnip} is currently compatible with {xgboost} version 1.7.8.1, which is the most recent CRAN version. This is happening because the {xgboost} R package on CRAN doesn't match the release versions.
It appears that xgboost is gearing up for another CRAN release https://github.com/dmlc/xgboost/issues/9810 so we should get ready for this.
@therealjpetereit is correct, in this PR https://github.com/dmlc/xgboost/pull/9862, they switch from having some arguments passed to xgb.DMatrix() as a named list in info, instead having all of them spelled out in full. They removed the info argument instead of deprecating it, giving us the error, because we pass things to info.
Ideally, {xgboost} would have deprecated info a little more robustly, so I think we need to so some switching on {xgboost} versions or updating {parsnip} once {xgboost} is out.
CRAN versions
library(parsnip)
xgb_spec <- boost_tree() |>
set_mode("regression") |>
set_engine("xgboost")
xgb_spec |>
fit(mpg ~ ., data = mtcars)
#> parsnip model object
#>
#> ##### xgb.Booster
#> raw: 21.6 Kb
#> call:
#> xgboost::xgb.train(params = list(eta = 0.3, max_depth = 6, gamma = 0,
#> colsample_bytree = 1, colsample_bynode = 1, min_child_weight = 1,
#> subsample = 1), data = x$data, nrounds = 15, watchlist = x$watchlist,
#> verbose = 0, nthread = 1, objective = "reg:squarederror")
#> params (as set within xgb.train):
#> eta = "0.3", max_depth = "6", gamma = "0", colsample_bytree = "1", colsample_bynode = "1", min_child_weight = "1", subsample = "1", nthread = "1", objective = "reg:squarederror", validate_parameters = "TRUE"
#> xgb.attributes:
#> niter
#> callbacks:
#> cb.evaluation.log()
#> # of features: 10
#> niter: 15
#> nfeatures : 10
#> evaluation_log:
#> iter training_rmse
#> <num> <num>
#> 1 14.9313149
#> 2 10.9568064
#> --- ---
#> 14 0.5628964
#> 15 0.4603055
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.2 (2024-10-31)
#> os macOS Sequoia 15.2
#> system aarch64, darwin20
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/Los_Angeles
#> date 2025-01-07
#> pandoc 3.6.1 @ /usr/local/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.0)
#> colorspace 2.1-1 2024-07-26 [1] CRAN (R 4.4.0)
#> data.table 1.16.4 2024-12-06 [1] CRAN (R 4.4.1)
#> digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1)
#> dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.4.0)
#> evaluate 1.0.1 2024-10-10 [1] CRAN (R 4.4.1)
#> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
#> fs 1.6.5 2024-10-30 [1] CRAN (R 4.4.1)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.0)
#> ggplot2 3.5.1 2024-04-23 [1] CRAN (R 4.4.0)
#> glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.1)
#> gtable 0.3.6 2024-10-25 [1] CRAN (R 4.4.1)
#> hardhat 1.4.0 2024-06-02 [1] CRAN (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
#> jsonlite 1.8.9 2024-09-20 [1] CRAN (R 4.4.1)
#> knitr 1.49 2024-11-08 [1] CRAN (R 4.4.1)
#> lattice 0.22-6 2024-03-20 [2] CRAN (R 4.4.2)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
#> Matrix 1.7-1 2024-10-18 [2] CRAN (R 4.4.2)
#> munsell 0.5.1 2024-04-01 [1] CRAN (R 4.4.0)
#> parsnip * 1.2.1 2024-03-22 [1] CRAN (R 4.4.0)
#> pillar 1.10.0 2024-12-17 [1] CRAN (R 4.4.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0)
#> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.4.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0)
#> reprex 2.1.1 2024-07-06 [1] CRAN (R 4.4.0)
#> rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.0)
#> rmarkdown 2.29 2024-11-04 [1] CRAN (R 4.4.1)
#> scales 1.3.0 2023-11-28 [1] CRAN (R 4.4.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.4.0)
#> tidyr 1.3.1 2024-01-24 [1] CRAN (R 4.4.0)
#> tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
#> withr 3.0.2 2024-10-28 [1] CRAN (R 4.4.1)
#> xfun 0.49 2024-10-31 [1] CRAN (R 4.4.1)
#> xgboost 1.7.8.1 2024-07-24 [1] CRAN (R 4.4.0)
#> yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.0)
#>
#> [1] /Users/emilhvitfeldt/Library/R/arm64/4.4/library
#> [2] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Created on 2025-01-07 with reprex v2.1.1
Yes,
I have been forced to move to GitHub versions of XGBoost for GPU integrations and even AMD GPU versions lately 😅 And who can resist installing the latest versions 😂😂
Cheers J
Noting that this is likely related to / duplicate of https://github.com/tidymodels/parsnip/issues/1087. :)
appears to be separate issue. but will hit us at the same time when they release to CRAN :)
Noting a tag in https://github.com/dmlc/xgboost/issues/9810#issuecomment-2599763233 that has some higher-level information.
For anyone trying to run parsnip with xgboost on the GPU: the combination parnsip 1.3.1 and xgboost 2.0.3 works.
- Download xgboost_r_gpu from xgboost reases page
- Then install in R:
install.packages("xgboost_r_gpu_linux_82d846bbeb83c652a0b1dff0e3519e67569c4a3d.tar.gz", repos = NULL, type = "source")
Hi all,
I believe XGBoost on GitHub is now on version 3.03 - yet the latest CRAN version is 1.7.11.1 The version disparity seems to become rather large =(
Do you plan to adjust the way parsnip ineteracts with XGBoost based on the newer releases or based on CRAN ? I am not sure what the best solution is either. XGBoost until v 2.03 still works with the current framework.
Cheers J
One of the problems is that they have been taking a while to get the CRAN package up to speed with the other versions they have.
At a glance, it appears like they are keeping it alive and working on CRAN without moving over the new features, hence why there have been recent releases.
For our notes: https://xgboost.readthedocs.io/en/stable/R-package/index.html
Since XGBoost 3.0.0, the latest R package is available on R-universe while the one on CRAN is kept at an older version. We will work on helping the CRAN version to catch up in the future. In the meantime, please use R-universe packages.
And they say to install from r-universe which is currently using version 3.0.4.1 https://dmlc.r-universe.dev/xgboost
Which tripped me up because the DESCRIPTION notes 3.1.0.0. All in all they just a different versioning system then what we use which is fine.
In the current state I imagine it would take some time for them to update CRAN all the way as the newer versions produces bugs because of the updates they made. Without a deprecation period this will be harder on CRAN.
That being said, we could do some switching based on version numbers as mentioned earlier
Hi,
I guess version switching might be the way to go. I think essential functions are all the same apart from the input for the xgb DMatrix creation.
Let me know if and when you get a chance to implement that. Not urgent.
Cheers J
downgrading from xgboost_3.1.1.1 (GPU) with parsnip_1.3.3 to xgboost_2.0.3.1 fixed my problem. Just needed to add the device to the function set_engine("xgboost", device = "cuda"). Thx @sjdh!