BUG when judging ndim of a series in dataframe
File: qlib/contrib/model/gbdt.py Line 43: x, y = df["feature"], df["label"] if y.values.ndim == 2 and y.values.shape[1] == 1: y = np.squeeze(y.values) else: raise ValueError("LightGBM doesn't support multi-label training")
y is a Series type from a Dataframe, so the ndim of y.value can only be 1, and is impossible to be 2, then error raises.
Hi, @huyp182 Can you tell me how to reproduce this?
Hi, @huyp182
Thank you for your attention to qlib. The issue you mentioned is not a bug, because:
- The DataFrame has multi-level indices (datetime, instrument);
- The DataFrame has multi-level columns (feature and label);
- There is only one subcolumn
LABEL0under the label column.
Your observation is based on the fact that “Series values must be 1D,” which is correct. However, in this DataFrame, df[“label”] returns a DataFrame, not a Series. Therefore, y.values.ndim == 2 is reasonable.
Reference code: LGBModel._prepare_data -> DatasetH.prepare -> DatasetH._prepare_seg -> DataHandler.fetch -> DataHandler._fetch_data