LightGBM
LightGBM copied to clipboard
[R-package] lgb.importance(): Error: R character strings are limited to 2^31-1 bytes
Hello!
I built a lightgbm model, and then I used the lgb.importance(model). finally, the r show the Error: R character strings are limited to 2^31-1 bytes.
how do I solve this error? Thank you!
Thanks for using LightGBM.
An error message alone is not enough information for us to help you. Please provide the following:
- version of R
- version of
{lightgbm}
- how you installed LightGBM
- operating system
- output of running
sessionInfo()
in your R session (if possible) - a minimal, reproducible example that generates this error (docs with some guidance on that)
Here's an example of how to create a reproducible example for the R package: https://github.com/microsoft/LightGBM/issues/4721#issue-1036595701
version of R: 4.3.2
version of {lightgbm}: 3.3.5
how you installed LightGBM: Tool>Install package
operating system: windows
output of running sessionInfo() in your R session (if possible) R version 4.3.2 (2023-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045) Matrix products: default
Thank you for that!
Tool>Install package
What does this mean?
version of {lightgbm}: 3.3.5
Can you please update to the latest version (v4.3.0) from CRAN and try again?
install.packages("lightgbm", repos = "https://cran.r-project.org")
And after that...we still won't be able to help much without a minimal, reproducible example.
my lightgbm package version is 3.3.5 it is still have an error. and I will try reproducible example
train model
model = lgb.train(params = list(objective = "regression", num_iterations = 100, metric = "l2", min_data = 1L, min_data_in_bin=100, min_gain_to_split = 10), data = train, nrounds = 100))
lgb_imp <- lgb.importance(model)
otherwise, when I decrease the num_iterations, the will not show the error.
my lightgbm package version is 3.3.5
Sorry if my placement was confusing. I'm asking what "Tool>Install package" means. Are those buttons you're clicking in an application? If so, what application?
train model
Thanks for this! But it is not a reproducible example.
Crucially... what does train
contain? Much of LightGBM's behavior (like any machine learning framework) is dependent on the size, shape, and distribution of the input data.
For example, based only on the error message you've provided, I can think of a few possibilities:
- your data has features with huge feature names
- your data has a very large number of rows
- you have a very large number of features
If you can't provide a reproducible example, can you please at least show the code you used to construct train
? Including any code for reading in data from files, databases, etc.
And report the size of the dataset (number of rows, number of columns, exact feature names if there are any).
yes, I click the buttons in R studio to install the lightgbm package.
data <- readRDS("D:/data.rds") dtrain <- lgb.Dataset(data=as.matrix(data[,-1]), label = data[,1]) model = lgb.train(params = list(objective = "regression", num_iterations = 100, metric = "l2", min_data = 1L, min_data_in_bin=100, min_gain_to_split = 10), data = train, nrounds = 100))
lgb_imp <- lgb.importance(model)
my dataset: number of rows: 4610000 number of columns: 21
Hi @jameslamb, I'm having the same issue when running lightgbm::lgb.model.dt.tree(lgb_model)
and I believe the culprit is here lgb.dump(booster = model, num_iteration = num_iteration)
. My dataset is also very large with a large number of rows and columns, and is fit with a complex model (i.e., not easy to share).
It looks like lgb.dump
is trying to return a single long character string, when num_iteration = NULL
, I wonder if instead it could be iterated into a large list or something? i.e., lapply(1:booster$current_iter(), booster$dump_model)
Edit: Never mind, the above won't work because dump_model
will return everything up to the selected iteration, not just the selected iteration. For context, I'm trying to run this through treeshap::unify()
I'm closing this in favor of #6380, which describes the same problem thoroughly with a reproducible example. Let's please focus there.