pyLightGBM
pyLightGBM copied to clipboard
Support init score
There is possibility to give init score (as array) in LightGBM in form additioonal file (train.txt.init).
Can you support this as well? As input to fit() function?
It is very suitable for regression task where init in form of zeros is not good and better choice is mean of target.
thx
Just updated master branch. Here is an example.
Would you mind creating a concrete example showing the advantage of using 'init_scores' ?
Thanks
Here is a draft but does not produce the desired effect (i.e init_score
does not make the model converge faster given the same number of iterations)
import numpy as np
from sklearn import datasets, metrics, model_selection
from pylightgbm.models import GBMRegressor
# Parameters
seed = 1337
path_to_exec = "~/Documents/apps/LightGBM/lightgbm"
offset = 1e4
np.random.seed(seed) # for reproducibility
X, y = datasets.make_regression(n_samples=1000, random_state=seed)
# shifting distribution by a huge margin to see if `init_scores` help for convergence
y += offset
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2, random_state=seed)
params = {'exec_path': path_to_exec,
'num_iterations': 10, 'learning_rate': 0.1,
'num_leaves': 10, 'is_training_metric': True,
'min_data_in_leaf': 10, 'is_unbalance': False,
'early_stopping_round': 10, 'verbose': False}
clf = GBMRegressor(**params)
clf.fit(x_train, y_train,
test_data=[(x_test, y_test)])
y_pred = clf.predict(x_test)
print("MSE: {}, best round: {}".format(metrics.mean_squared_error(y_test, y_pred), clf.best_round))
clf.fit(x_train, y_train,
test_data=[(x_test, y_test)],
init_scores=offset * np.ones(len(x_train)))
y_pred = clf.predict(x_test)
print("MSE: {}, best round: {}".format(metrics.mean_squared_error(y_test, y_pred), clf.best_round))
Any thoughts ?
I am solving regression task where np.average(target_learn) = 15
And I have matrix [500k x 400], best setting for me so far is 7500 iterations with learning_rate = 0.002. In such case (very small learning_rate) this option can save about 1500 iterations (20% boost) (in xgboost version).
I will test this and let you know. Thx, I appreciate your work!
hmm I believe there is some bug (not sure here or on lightGBM)
in your example I need to do:
y_pred = clf.predict(x_test) + offset * np.ones(len(x_test))
to get correct test prediction...
also log shows wrong test error if init_score is presented..
thinking about this: y_pred = clf.predict(x_test) + offset * np.ones(len(x_test)) it is probably correct behavior for general purpose usage (mainly for continiously learning from another model)
so only fixing printing val_error can be done...