Infinite predictions when using Poisson regression with large weights

Open isadofschi opened this issue 3 years ago • 0 comments

Description

Using objective='poisson' with big weights in some cases produces very large predictions, up to $10^{252}$ times higher than the biggest example in the train dataset.

Reproducible example

A minimum working example of this can be produced with the dataset 'mwe.csv' which can be downloaded from https://pastebin.com/0gw1vz9x

import pandas as pd
import lightgbm as lgb
df = pd.read_csv("mwe.csv")
print(df['y'].describe())

model = lgb.LGBMRegressor(objective='poisson', num_trees=1, verbose=2)
model.fit(df[['x']], df['y'], sample_weight=df['w'])
print(pd.Series(model.predict(df[['x']])).describe())

The example trains a model with a single tree however it stops after the first tree for higher values of num_trees.

The tree looks like this:

The problem disappears if we use sample_weight=df['w']/10000 instead. We obtain the following tree

Dividing the weights by any value between 2008 and 16454 seems to produce the same tree.

Environment info

LightGBM version or commit hash: This can be reproduced both with the latest version and with v3.2.1.

Aug 12 '22 17:08 isadofschi