pyLightGBM icon indicating copy to clipboard operation
pyLightGBM copied to clipboard

Support init score

Open gugatr0n1c opened this issue 8 years ago • 5 comments

There is possibility to give init score (as array) in LightGBM in form additioonal file (train.txt.init).

Can you support this as well? As input to fit() function?

It is very suitable for regression task where init in form of zeros is not good and better choice is mean of target.

thx

gugatr0n1c avatar Nov 13 '16 09:11 gugatr0n1c

Just updated master branch. Here is an example.

Would you mind creating a concrete example showing the advantage of using 'init_scores' ?

Thanks

ArdalanM avatar Nov 13 '16 12:11 ArdalanM

Here is a draft but does not produce the desired effect (i.e init_score does not make the model converge faster given the same number of iterations)

import numpy as np
from sklearn import datasets, metrics, model_selection
from pylightgbm.models import GBMRegressor

# Parameters
seed = 1337
path_to_exec = "~/Documents/apps/LightGBM/lightgbm"
offset = 1e4

np.random.seed(seed) # for reproducibility
X, y = datasets.make_regression(n_samples=1000, random_state=seed)

# shifting distribution by a huge margin to see if `init_scores` help for convergence
y += offset
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2, random_state=seed)

params = {'exec_path': path_to_exec,
          'num_iterations': 10, 'learning_rate': 0.1,
          'num_leaves': 10, 'is_training_metric': True,
          'min_data_in_leaf': 10, 'is_unbalance': False,
          'early_stopping_round': 10, 'verbose': False}
clf = GBMRegressor(**params)

clf.fit(x_train, y_train,
        test_data=[(x_test, y_test)])
y_pred = clf.predict(x_test)
print("MSE: {}, best round: {}".format(metrics.mean_squared_error(y_test, y_pred), clf.best_round))

clf.fit(x_train, y_train,
        test_data=[(x_test, y_test)],
        init_scores=offset * np.ones(len(x_train)))
y_pred = clf.predict(x_test)
print("MSE: {}, best round: {}".format(metrics.mean_squared_error(y_test, y_pred), clf.best_round))

Any thoughts ?

ArdalanM avatar Nov 13 '16 13:11 ArdalanM

I am solving regression task where np.average(target_learn) = 15

And I have matrix [500k x 400], best setting for me so far is 7500 iterations with learning_rate = 0.002. In such case (very small learning_rate) this option can save about 1500 iterations (20% boost) (in xgboost version).

I will test this and let you know. Thx, I appreciate your work!

gugatr0n1c avatar Nov 13 '16 13:11 gugatr0n1c

hmm I believe there is some bug (not sure here or on lightGBM)

in your example I need to do:

y_pred = clf.predict(x_test) + offset * np.ones(len(x_test))

to get correct test prediction...

also log shows wrong test error if init_score is presented..

gugatr0n1c avatar Nov 13 '16 15:11 gugatr0n1c

thinking about this: y_pred = clf.predict(x_test) + offset * np.ones(len(x_test)) it is probably correct behavior for general purpose usage (mainly for continiously learning from another model)

so only fixing printing val_error can be done...

gugatr0n1c avatar Nov 13 '16 18:11 gugatr0n1c