rsmtool icon indicating copy to clipboard operation
rsmtool copied to clipboard

Fairness sections generate error with latest version of pandas (1.5.0) and seaborn (0.12.0)

Open desilinguist opened this issue 3 years ago • 0 comments

With those two versions, generating a report with the fairness sections included yields the following error:

ValueError                                Traceback (most recent call last)
Cell In [58], line 110
    107 with sns.axes_style('whitegrid'), sns.plotting_context('notebook', font_scale=2):
    108     g = sns.FacetGrid(df_coefs_all, col="metrics",
    109                       height=10, col_order = ['osa', 'osd', 'csd'])
--> 110     g.map_dataframe(errplot, group, "error_estimate",  "CI").set_axis_labels("Error estimate",
    111                                                                             group)
    113     imgfile = join(figure_dir, '{}_fairness_estimates_{}.svg'.format(experiment_id, group))
    114     plt.savefig(imgfile)

...

File ~/anaconda/envs/rsmdev/lib/python3.9/site-packages/matplotlib/axes/_axes.py:3587, in Axes.errorbar(self, x, y, yerr, xerr, fmt, ecolor, elinewidth, capsize, barsabove, lolims, uplims, xlolims, xuplims, errorevery, capthick, **kwargs)
   3584 res = np.zeros(err.shape, dtype=bool)  # Default in case of nan
   3585 if np.any(np.less(err, -err, out=res, where=(err == err))):
   3586     # like err<0, but also works for timedelta and nan.
-> 3587     raise ValueError(
   3588         f"'{dep_axis}err' must not contain negative values")
   3589 # This is like
   3590 #     elow, ehigh = np.broadcast_to(...)
   3591 #     return dep - elow * ~lolims, dep + ehigh * ~uplims
   3592 # except that broadcast_to would strip units.
   3593 low, high = dep + np.row_stack([-(1 - lolims), 1 - uplims]) * err

ValueError: 'xerr' must not contain negative values
ValueError: 'xerr' must not contain negative values

This is happening because in the fairness_analysis.ipynb notebook, we compute the confidence intervals via this statement:


df_coefs_all['CI'] = df_coefs_all['[0.025'] - df_coefs_all['estimate']

However, these confidence intervals cannot be negative anymore. So, all we need to do is to use absolute values, i.e., replace the above statement with:

df_coefs_all['CI'] = np.abs(df_coefs_all['[0.025'] - df_coefs_all['estimate'])

Until this fix is merged, the workaround is simply to downgrade pandas to v1.4.3 and seaborn to v0.11.2.

desilinguist avatar Sep 21 '22 20:09 desilinguist