[Bug] ValueError caused by column with nan values
When using shapash along with the following codes:
xpl = SmartExplainer(
model=model,
)
xpl.compile(
x=test_df,
...
)
it will call the init_app in SmartApp class where the following codes are used to calculate the std of a certain column (line 192 - 199):
for col in list(self.dataframe.columns):
typ = self.dataframe[col].dtype
if typ == float:
std = self.dataframe[col].std()
if std != 0:
digit = max(round(log10(1 / std) + 1) + 2, 0)
self.round_dataframe[col] = self.dataframe[col].map(f"{{:.{digit}f}}".format).astype(float)
However, when a column with nan values, std will be nan and execute the following line:
digit = max(round(log10(1 / std) + 1) + 2, 0)
and result in ValueError:
File "xxx/.local/lib/python3.10/site-packages/shapash/webapp/smart_app.py", line 197, in init_data
digit = max(round(log10(1 / std) + 1) + 2, 0)
ValueError: cannot convert float NaN to integer
Python version : python3.10
Shapash version : shapash-2.5.0
Operating System : CentOS Linux release 8.2.2.2004
Thank you to report this issue, if I understand correctly you should have only nan values in your columns, is it right ? Actually the code should not bug if you have a column with just some nan values in it. Can you tell us why you have such a column used in your model ? We don't see the use case. In any case we will look at your PR and see the best way to tackle this issue.
Hi, there. Usually, the nan columns could be removed during data preprocessing. However, in my case, the data is several time series and one column contains only nan values before a specific date. Thus, when I conduct experiments, where I need to split the data according to dates for training. However, the all-nan values in this column before a specific date cause this error.
It has been fixed with the version 2.6.0 of shapash (https://github.com/MAIF/shapash/pull/553)