Hello!

I built a lightgbm model, and then I used the lgb.importance(model). finally, the r show the Error: R character strings are limited to 2^31-1 bytes.

how do I solve this error? Thank you!

Jan 27 '24 04:01 Chuang1128

Thanks for using LightGBM.

An error message alone is not enough information for us to help you. Please provide the following:

version of R
version of {lightgbm}
how you installed LightGBM
operating system
output of running sessionInfo() in your R session (if possible)
a minimal, reproducible example that generates this error (docs with some guidance on that)

Here's an example of how to create a reproducible example for the R package: https://github.com/microsoft/LightGBM/issues/4721#issue-1036595701

Jan 27 '24 04:01 jameslamb

version of R: 4.3.2

version of {lightgbm}: 3.3.5

how you installed LightGBM: Tool>Install package

operating system: windows

output of running sessionInfo() in your R session (if possible) R version 4.3.2 (2023-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045) Matrix products: default

Jan 27 '24 05:01 Chuang1128

Thank you for that!

Tool>Install package

What does this mean?

version of {lightgbm}: 3.3.5

Can you please update to the latest version (v4.3.0) from CRAN and try again?

install.packages("lightgbm", repos = "https://cran.r-project.org")

And after that...we still won't be able to help much without a minimal, reproducible example.

Jan 27 '24 05:01 jameslamb

my lightgbm package version is 3.3.5 it is still have an error. and I will try reproducible example

Jan 27 '24 05:01 Chuang1128

train model

model = lgb.train(params = list(objective = "regression", num_iterations = 100, metric = "l2", min_data = 1L, min_data_in_bin=100, min_gain_to_split = 10), data = train, nrounds = 100))

lgb_imp <- lgb.importance(model)

otherwise, when I decrease the num_iterations, the will not show the error.

Jan 27 '24 05:01 Chuang1128

my lightgbm package version is 3.3.5

Sorry if my placement was confusing. I'm asking what "Tool>Install package" means. Are those buttons you're clicking in an application? If so, what application?

train model

Thanks for this! But it is not a reproducible example.

Crucially... what does train contain? Much of LightGBM's behavior (like any machine learning framework) is dependent on the size, shape, and distribution of the input data.

For example, based only on the error message you've provided, I can think of a few possibilities:

your data has features with huge feature names
your data has a very large number of rows
you have a very large number of features

If you can't provide a reproducible example, can you please at least show the code you used to construct train? Including any code for reading in data from files, databases, etc.

And report the size of the dataset (number of rows, number of columns, exact feature names if there are any).

Jan 27 '24 05:01 jameslamb

yes, I click the buttons in R studio to install the lightgbm package.

data <- readRDS("D:/data.rds") dtrain <- lgb.Dataset(data=as.matrix(data[,-1]), label = data[,1]) model = lgb.train(params = list(objective = "regression", num_iterations = 100, metric = "l2", min_data = 1L, min_data_in_bin=100, min_gain_to_split = 10), data = train, nrounds = 100))

lgb_imp <- lgb.importance(model)

my dataset: number of rows: 4610000 number of columns: 21

Jan 27 '24 06:01 Chuang1128

Hi @jameslamb, I'm having the same issue when running lightgbm::lgb.model.dt.tree(lgb_model) and I believe the culprit is here lgb.dump(booster = model, num_iteration = num_iteration). My dataset is also very large with a large number of rows and columns, and is fit with a complex model (i.e., not easy to share).

It looks like lgb.dump is trying to return a single long character string, when num_iteration = NULL, I wonder if instead it could be iterated into a large list or something? i.e., lapply(1:booster$current_iter(), booster$dump_model)

Edit: Never mind, the above won't work because dump_model will return everything up to the selected iteration, not just the selected iteration. For context, I'm trying to run this through treeshap::unify()

Mar 07 '24 14:03 p-schaefer

I'm closing this in favor of #6380, which describes the same problem thoroughly with a reproducible example. Let's please focus there.

Mar 22 '24 14:03 jameslamb

LightGBM
LightGBM copied to clipboard

[R-package] lgb.importance(): Error: R character strings are limited to 2^31-1 bytes

train model

LightGBM LightGBM copied to clipboard

[R-package] lgb.importance(): Error: R character strings are limited to 2^31-1 bytes

train model

LightGBM
LightGBM copied to clipboard