LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

Infinite predictions when using Poisson regression with large weights

Open isadofschi opened this issue 3 years ago • 0 comments

Description

Using objective='poisson' with big weights in some cases produces very large predictions, up to $10^{252}$ times higher than the biggest example in the train dataset.

Reproducible example

A minimum working example of this can be produced with the dataset 'mwe.csv' which can be downloaded from https://pastebin.com/0gw1vz9x

import pandas as pd
import lightgbm as lgb
df = pd.read_csv("mwe.csv")
print(df['y'].describe())

model = lgb.LGBMRegressor(objective='poisson', num_trees=1, verbose=2)
model.fit(df[['x']], df['y'], sample_weight=df['w'])
print(pd.Series(model.predict(df[['x']])).describe())

The example trains a model with a single tree however it stops after the first tree for higher values of num_trees.

The tree looks like this:

image

The problem disappears if we use sample_weight=df['w']/10000 instead. We obtain the following tree

image

Dividing the weights by any value between 2008 and 16454 seems to produce the same tree.

Environment info

LightGBM version or commit hash: This can be reproduced both with the latest version and with v3.2.1.

isadofschi avatar Aug 12 '22 17:08 isadofschi