LightGBM Median wrongly computed

Median wrongly computed

Open zachary62 opened this issue 1 year ago • 0 comments

import numpy as np
import lightgbm as lgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
import pandas as pd

data = {"X": [1, 2, 3, 4, 5],
        "Y": [1, 2, 3, 4, 100]}

df = pd.DataFrame(data)

X_train, y_train = df[["X"]], df["Y"]

# Create a LightGBM dataset from the training data
train_data = lgb.Dataset(X_train, label=y_train)

# Set the parameters for the LightGBM model
params = {
    'boosting_type': 'gbdt',
    'objective': 'regression_l1',  # Use MAE for the objective
    'metric': 'mae',
    'num_leaves': 2,
    'min_data_in_leaf': 10,  # Set the minimum number of samples per leaf to 10
    'learning_rate': 1,
    'feature_fraction':1.0,
    'num_iterations': 1,    # Only use one tree
}

# Train the LightGBM model
model = lgb.train(params, train_data)

# Make predictions on the test data
y_pred = model.predict(X_train)

# Output the predictions
print("Predictions:")
print(y_pred)

Given the above codes, the tree won't split because there is not enough data, so it should output the median (3). However, the output is:

[LightGBM] [Warning] There are no meaningful features, as all feature values are constant.
[LightGBM] [Info] Total Bins 0
[LightGBM] [Info] Number of data points in the train set: 5, number of used features: 0
[LightGBM] [Info] Start training from score 3.500000
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
Predictions:
[3.5 3.5 3.5 3.5 3.5]

I find the issue. It's this line:

https://github.com/microsoft/LightGBM/blob/ac37bf8a1dcc981dedadd9943e338e10ec072c01/src/objective/regression_objective.hpp#L27

Given cnt_data = 5, and alpha = 0.5, the pos of median should be 2. but it gives 2.5.

The correct formula should be:

const double float_pos = static_cast<double>(cnt_data - 1) * alpha;

Is this intentionally or is it a bug?

Apr 24 '23 23:04 zachary62

LightGBM LightGBM copied to clipboard

Median wrongly computed

LightGBM
LightGBM copied to clipboard