shapash icon indicating copy to clipboard operation
shapash copied to clipboard

[Bug] ValueError caused by column with nan values

Open tswsxk opened this issue 1 year ago • 2 comments

When using shapash along with the following codes:

xpl = SmartExplainer(
    model=model,
)

xpl.compile(
    x=test_df,
    ...
)

it will call the init_app in SmartApp class where the following codes are used to calculate the std of a certain column (line 192 - 199):

        for col in list(self.dataframe.columns):
            typ = self.dataframe[col].dtype
            if typ == float:
                std = self.dataframe[col].std()
                if std != 0:
                    digit = max(round(log10(1 / std) + 1) + 2, 0)
                    self.round_dataframe[col] = self.dataframe[col].map(f"{{:.{digit}f}}".format).astype(float)

However, when a column with nan values, std will be nan and execute the following line:

                    digit = max(round(log10(1 / std) + 1) + 2, 0)

and result in ValueError:

File "xxx/.local/lib/python3.10/site-packages/shapash/webapp/smart_app.py", line 197, in init_data
    digit = max(round(log10(1 / std) + 1) + 2, 0)
ValueError: cannot convert float NaN to integer

Python version : python3.10

Shapash version : shapash-2.5.0

Operating System : CentOS Linux release 8.2.2.2004

tswsxk avatar May 23 '24 08:05 tswsxk

Thank you to report this issue, if I understand correctly you should have only nan values in your columns, is it right ? Actually the code should not bug if you have a column with just some nan values in it. Can you tell us why you have such a column used in your model ? We don't see the use case. In any case we will look at your PR and see the best way to tackle this issue.

guillaume-vignal avatar Jun 06 '24 08:06 guillaume-vignal

Hi, there. Usually, the nan columns could be removed during data preprocessing. However, in my case, the data is several time series and one column contains only nan values before a specific date. Thus, when I conduct experiments, where I need to split the data according to dates for training. However, the all-nan values in this column before a specific date cause this error.

tswsxk avatar Jun 13 '24 02:06 tswsxk

It has been fixed with the version 2.6.0 of shapash (https://github.com/MAIF/shapash/pull/553)

guillaume-vignal avatar Jul 04 '24 14:07 guillaume-vignal