verstack
verstack copied to clipboard
feval function for LGBMTuner
Can you add a feval function like in lightgbm.train?
It is the customized evaluation function. Each evaluation function should accept two parameters: preds, eval_data, and return (eval_name, eval_result, is_higher_better) or list of such tuples
Hi. This can be done and I'll gladly implement feval function into LGBMTuner
, but your proposal has some unknowns:
If you want to pass your own evaluation function and expect LGBMTuner
to know is_higher_better for this custom function, then this function that you will pass for evaluation has to have this information.
E.g. you define your eval func as follows:
def error(true, pred):
return np.mean(true - pred)
And then you pass it to LGBMTuner
as an init argument. Where should the is_higher_better
be retrieved from this function?
I can retrieve a .__name__
attribute to return the name of the function that you had passed. But there is no way I can know which is better higher or lower for this custom function.
Would the feval
functionality without the is_higher_better
attribute be sufficient?
Hi. This can be done and I'll gladly implement feval function into
LGBMTuner
, but your proposal has some unknowns:If you want to pass your own evaluation function and expect
LGBMTuner
to know is_higher_better for this custom function, then this function that you will pass for evaluation has to have this information.E.g. you define your eval func as follows:
def error(true, pred): return np.mean(true - pred)
And then you pass it to
LGBMTuner
as an init argument. Where should theis_higher_better
be retrieved from this function?I can retrieve a
.__name__
attribute to return the name of the function that you had passed. But there is no way I can know which is better higher or lower for this custom function.Would the
feval
functionality without theis_higher_better
attribute be sufficient?
@DanilZherebtsov Hi!
For example, I have a training method for the lightgbm model:
y = data['price_log']
X = data.drop(columns=['price','price_log'])
X_train, X_test, y_train, y_test = scsplit(X, y, stratify = y,test_size = 0.3, random_state = 5)
train_dataset = lgb.Dataset(X_train, y_train,categorical_feature = categorical_cols) # Define the datasets
test_dataset = lgb.Dataset(X_test, y_test, reference=train_dataset)
def custom_mape(preds, tests):
actuals = np.exp(tests.get_label()) if isinstance(tests, lgb.Dataset) else tests
error = ((abs(actuals - np.exp(preds)))/actuals).sum()/actuals.shape[0]
is_higher_better = False
return "custom_mape", error, is_higher_better
params ={'learning_rate': 1,
'num_leaves': 1,
'feature_fraction': 1,
'bagging_fraction': 1,
'verbosity': -1,
'random_state': 42,
'device_type': 'cpu',
'objective': 'regression',
'metric': 'mape',
'num_threads': 1,
'num_iterations': 1,
'early_stopping_rounds': 1}
booster = lgb.train(
params=params, # Get hyper parameters
train_set=train_dataset,
valid_sets=[train_dataset, test_dataset],
valid_names=['train', 'valid'],
verbose_eval=False,
early_stopping_rounds=params['early_stopping_rounds'],
feval=custom_mape, # Call the custom MAPE
)
as you can see, my target is the price under logarithm. And for proper training, I needed to write my own evaluation method:
def custom_mape(preds, tests):
actuals = np.exp(tests.get_label()) if isinstance(tests, lgb.Dataset) else tests
error = ((abs(actuals - np.exp(preds)))/actuals).sum()/actuals.shape[0]
is_higher_better = False
return "custom_mape", error, is_higher_better
This method used in the parameters of lightGBM itself:
booster = lgb.train(
params=params, # Get hyper parameters
train_set=train_dataset,
valid_sets=[train_dataset, test_dataset],
valid_names=['train', 'valid'],
verbose_eval=False,
early_stopping_rounds=params['early_stopping_rounds'],
feval=custom_mape, # Call the custom MAPE
)