qlib
qlib copied to clipboard
Fillna does not work if fields_group is not None
🐛 Bug Description
The Fillna processor does not work if fields_group is not None since assigning values to df.values changes nothing.
To Reproduce
Use any model and specify fields_group for Fillna processor.
Expected Behavior
No nan after calling Fillna.
Additional Notes
Same as the issue here: https://github.com/microsoft/qlib/issues/1307#issuecomment-1785284039.
I think simply using slice assignment would be ok:
def __call__(self, df):
cols = get_group_columns(df, self.fields_group)
df.loc[:, cols] = df.loc[:, cols].fillna(self.fill_value)
return df
Or if you want to use numpy to accelerate (I can achieve 10x speed), you should assign the df.values (or df.to_numpy()) to a variable first, then fill and assign back:
def __call__(self, df):
if self.fields_group is None:
df.fillna(self.fill_value, inplace=True)
else:
cols = get_group_columns(df, self.fields_group)
# this implementation is extremely slow
# df.fillna({col: self.fill_value for col in cols}, inplace=True)
#! similar to qlib.data.dataset.processor.Fillna, we use numpy to accelerate
#! but instead, we assign the numpy array to a variable first
df_values = df[cols].to_numpy()
nan_select = np.isnan(df_values)
#! then fill value and assign back
df_values[nan_select] = self.fill_value
df.loc[:, cols] = df_values
return df