fairlearn
fairlearn copied to clipboard
ENH Updated EY Notebook
This is an in-progress version of the EY whitepaper notebook. The main goal of this notebook is to highlight some of the functionality of Fairlearn (e.g. how to select an inner model of ExponentiatedGradient, how to incorporate uncertainty quantification).
The goal of this draft pull request is to community feedback about the current workflow and whether this use case aligns with our community values. Some concerns that have been surfaced:
- We introduce a synthetic feature
LIMIT
to generate a disparity betweenMale
andFemale
applicants in the model's performance. - We are using a dataset pertaining to loans in Taiwan (collected in 2005) to motivate an example around credit lending in the United States.
Items still left to complete:
- Add conclusion sections.
- Add problem introduction and motivation section.
Are you sure you want to put it in an ipynb, not .py in the examples folder?
Miro and I discussed leaving it in .ipynb format for now and converting to a .py file at a later time. In my opinion, the SciPy tutorial/PyData Global notebook is a better fit for the examples folder.
Miro and I discussed leaving it in .ipynb format for now and converting to a .py file at a later time. In my opinion, the SciPy tutorial/PyData Global notebook is a better fit for the examples folder.
Thanks for elaborating! I agree on converting other use cases to .py
, but that doesn't necessarily mean we should leave this one in .ipynb
. That said, perhaps this is a separate issue that can be tackled in another PR. I was probably optimistically thinking we'd do all of those tasks at one go. I'm not sure whether we should add more notebooks into that folder, though. Not converting to .py
is one thing, but adding even more that eventually need converting is not a step in the right direction IMO. I'm all in favor of fixing/updating existing ones, of course.
Tagging @MiroDudik since you mentioned that there was some prior discussion so that he's aware.
Miro and I discussed leaving it in .ipynb format for now and converting to a .py file at a later time. In my opinion, the SciPy tutorial/PyData Global notebook is a better fit for the examples folder.
Thanks for elaborating! I agree on converting other use cases to
.py
, but that doesn't necessarily mean we should leave this one in.ipynb
. That said, perhaps this is a separate issue that can be tackled in another PR. I was probably optimistically thinking we'd do all of those tasks at one go. I'm not sure whether we should add more notebooks into that folder, though. Not converting to.py
is one thing, but adding even more that eventually need converting is not a step in the right direction IMO. I'm all in favor of fixing/updating existing ones, of course.Tagging @MiroDudik since you mentioned that there was some prior discussion so that he's aware.
I think that the idea is that the notebook in this PR would replace this one: https://github.com/fairlearn/fairlearn/blob/main/notebooks/Binary%20Classification%20with%20the%20UCI%20Credit-card%20Default%20Dataset.ipynb
I agree that we should discourage creating new notebooks in .ipynb
, but I'm viewing this specific PR as a replacement of an existing notebook, so I'm okay with it being .ipynb
. This is being linked to from our landing page, so I'm in favor of expediency here.
That works for me. I see one newly added ipynb and one changed one, though. Are we planning to delete one of these? @LeJit
Perhaps tangential, but I'm really a fan of the notebooks you (and others) created for SciPy and PyData Global. If we ever want to link to those I'd be very much in favor.
Yes. The end goal is to delete the old notebook, eventually. I left it in for now, so that people have something to compare the new notebook to.
Hey @mmadaio @hildeweerts , when you have the time, could you take a look at this? I think there's always inherent issues since I'm synthesizing the disparity, but there are parts of this notebook that I think are worthwhile, and we may want to discuss using certain parts of it in our educational materials.
@MiroDudik The final "draft" is ready.
Hey @mmadaio,
I finally got around to updating the notebook to reflect your feedback. A summary of the changes is below:
It would be good to add more explanation as to why you're using "sex" as the sensitive feature.
I added a section at the beginning about the Apple credit card story where the model allocated lower credit limits to women compared to men, even within the same household.
Perhaps relatedly, I didn't understand the renamed column names for PAY_0 and PAY_1. Aren't "default payment next month" and "default" the same thing?
Added a small explanation in the notebook about the re-naming. PAY_0
is renamed to PAY_1
to keep the naming consistent with the other features, and default payment next month
is re-named to default
to reduce verbosity of the feature name (I didn't want to deal with a feature that had spaces in its name).
I may be confused, but there's a lot of references to "loans" but, if I understand correctly, the dataset is of people paying off monthly credit card bills (or defaulting on them).
I changed the "story" of the case study to be predicting loan default rates using the monthly credit card payment dataset, where we use failure to make monthly card payments as a sign of poor "creditworthiness". Hopefully, this addresses the discrepancy between the two tasks.
What does the term "limit" for the synthetic feature refer to? Is this a commonly used term for this that readers will be familiar with? (and is this a common way of defining it? e.g., using a Gaussian distribution with those criteria)
I changed the synthetic feature to represent Interest
rates. The story behind the feature is now that people who have defaulted on credit card payments are given higher interest rates, and there is more variance in the interest rates allocated to women compared to men.
Codecov Report
:exclamation: No coverage uploaded for pull request base (
main@504804b
). Click here to learn what that means. The diff coverage isn/a
.
@@ Coverage Diff @@
## main #1008 +/- ##
=======================================
Coverage ? 94.12%
=======================================
Files ? 52
Lines ? 2194
Branches ? 0
=======================================
Hits ? 2065
Misses ? 129
Partials ? 0
Flag | Coverage Δ | |
---|---|---|
unittests | 94.12% <0.00%> (?) |
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 504804b...54c2c59. Read the comment docs.
I agree that we should discourage creating new notebooks in .ipynb, but I'm viewing this specific PR as a replacement of an existing notebook, so I'm okay with it being .ipynb. This is being linked to from our landing page, so I'm in favor of expediency here.
Then I'd like to remove the old one in this PR and add a new .py
file instead.
I agree that we should discourage creating new notebooks in .ipynb, but I'm viewing this specific PR as a replacement of an existing notebook, so I'm okay with it being .ipynb. This is being linked to from our landing page, so I'm in favor of expediency here.
Then I'd like to remove the old one in this PR and add a new
.py
file instead.
I've opened #1108 to discuss this point...
@MiroDudik I made all the changes you requested. You can take another look to see if there is anything else I need to change. I also deleted the old notebook.
I'll add the new unit tests in a later commit.
Thank you @romanlutz
Of course! For context, we discussed this in the community call last week. Since there were quite a few ipynb-to-py (and md to rst) peculiarities that needed addressing I volunteered to take care of it as I've done this a few times. I suppose I should have provided a full page screenshot to make this easier to review:
... and I already found a couple of $
signs that I hadn't converted yet. 😆
@LeJit why is seaborn in the dependencies? I don't see it used.
@LeJit why is seaborn in the dependencies? I don't see it used.
@romanlutz There were some things with visualizing and comparing error rates across different models at the end that Miro and I wanted to, but we eventually decided to cut. You can remove the seaborn
dependency and I believe the CalibrationClassifierCV
import as well.
I'll try to get to it today. I have a bunch of tiny fixes locally as well. I suspect the linter will make me do a whole lot of fixes, too.
You may want to merge PR #1163 to make the linter work again.
This is now up-to-date with latest main and I've fixed all flake8 issues. The remaining open issues afaict:
- we're not using
fetch_credit_card
because the column names are off (see corresponding comment) - we're not using
plot_model_comparison
These seem like easily fixable problems in small follow-up PRs that don't affect the example in terms of validity. If either of you (@MiroDudik, @LeJit ) want to take a stab at them please feel free to push your updates directly. I've done the part that I had promised 😄
I think it would be better to make those changes in follow-up PRs. I really can't think of any way to identify the Pareto dominated inner models that doesn't require us to compute the balanced_accuracy
and equalized_odds
for each of the inner models, so we're always going to have to do that inner models sweep.
Also, if we want an example using plot_model_comparison
, I personally think the SciPy/PyCon tutorial would be a better place to incorporate that method since we're not filtering dominated classifiers in that example. What do you think @MiroDudik ?
i tried to deal with the issue of crashing build of the website, but the issue is that the sphinx gallery simply takes too long to build on CircleCI...
UPDATE: I doubled the webpage build time to 60min, but it's still timing out.