fairlearn icon indicating copy to clipboard operation
fairlearn copied to clipboard

ENH Updated EY Notebook

Open LeJit opened this issue 2 years ago • 13 comments

This is an in-progress version of the EY whitepaper notebook. The main goal of this notebook is to highlight some of the functionality of Fairlearn (e.g. how to select an inner model of ExponentiatedGradient, how to incorporate uncertainty quantification).

The goal of this draft pull request is to community feedback about the current workflow and whether this use case aligns with our community values. Some concerns that have been surfaced:

  • We introduce a synthetic feature LIMIT to generate a disparity between Male and Female applicants in the model's performance.
  • We are using a dataset pertaining to loans in Taiwan (collected in 2005) to motivate an example around credit lending in the United States.

Items still left to complete:

  • Add conclusion sections.
  • Add problem introduction and motivation section.

LeJit avatar Jan 18 '22 21:01 LeJit

Are you sure you want to put it in an ipynb, not .py in the examples folder?

romanlutz avatar Jan 18 '22 21:01 romanlutz

Miro and I discussed leaving it in .ipynb format for now and converting to a .py file at a later time. In my opinion, the SciPy tutorial/PyData Global notebook is a better fit for the examples folder.

LeJit avatar Jan 19 '22 17:01 LeJit

Miro and I discussed leaving it in .ipynb format for now and converting to a .py file at a later time. In my opinion, the SciPy tutorial/PyData Global notebook is a better fit for the examples folder.

Thanks for elaborating! I agree on converting other use cases to .py, but that doesn't necessarily mean we should leave this one in .ipynb. That said, perhaps this is a separate issue that can be tackled in another PR. I was probably optimistically thinking we'd do all of those tasks at one go. I'm not sure whether we should add more notebooks into that folder, though. Not converting to .py is one thing, but adding even more that eventually need converting is not a step in the right direction IMO. I'm all in favor of fixing/updating existing ones, of course.

Tagging @MiroDudik since you mentioned that there was some prior discussion so that he's aware.

romanlutz avatar Jan 19 '22 18:01 romanlutz

Miro and I discussed leaving it in .ipynb format for now and converting to a .py file at a later time. In my opinion, the SciPy tutorial/PyData Global notebook is a better fit for the examples folder.

Thanks for elaborating! I agree on converting other use cases to .py, but that doesn't necessarily mean we should leave this one in .ipynb. That said, perhaps this is a separate issue that can be tackled in another PR. I was probably optimistically thinking we'd do all of those tasks at one go. I'm not sure whether we should add more notebooks into that folder, though. Not converting to .py is one thing, but adding even more that eventually need converting is not a step in the right direction IMO. I'm all in favor of fixing/updating existing ones, of course.

Tagging @MiroDudik since you mentioned that there was some prior discussion so that he's aware.

I think that the idea is that the notebook in this PR would replace this one: https://github.com/fairlearn/fairlearn/blob/main/notebooks/Binary%20Classification%20with%20the%20UCI%20Credit-card%20Default%20Dataset.ipynb

I agree that we should discourage creating new notebooks in .ipynb, but I'm viewing this specific PR as a replacement of an existing notebook, so I'm okay with it being .ipynb. This is being linked to from our landing page, so I'm in favor of expediency here.

MiroDudik avatar Jan 24 '22 16:01 MiroDudik

That works for me. I see one newly added ipynb and one changed one, though. Are we planning to delete one of these? @LeJit

Perhaps tangential, but I'm really a fan of the notebooks you (and others) created for SciPy and PyData Global. If we ever want to link to those I'd be very much in favor.

romanlutz avatar Jan 25 '22 01:01 romanlutz

Yes. The end goal is to delete the old notebook, eventually. I left it in for now, so that people have something to compare the new notebook to.

LeJit avatar Jan 26 '22 14:01 LeJit

Hey @mmadaio @hildeweerts , when you have the time, could you take a look at this? I think there's always inherent issues since I'm synthesizing the disparity, but there are parts of this notebook that I think are worthwhile, and we may want to discuss using certain parts of it in our educational materials.

LeJit avatar Mar 07 '22 16:03 LeJit

@MiroDudik The final "draft" is ready.

LeJit avatar Mar 21 '22 15:03 LeJit

Hey @mmadaio,

I finally got around to updating the notebook to reflect your feedback. A summary of the changes is below:

It would be good to add more explanation as to why you're using "sex" as the sensitive feature.

I added a section at the beginning about the Apple credit card story where the model allocated lower credit limits to women compared to men, even within the same household.

Perhaps relatedly, I didn't understand the renamed column names for PAY_0 and PAY_1. Aren't "default payment next month" and "default" the same thing?

Added a small explanation in the notebook about the re-naming. PAY_0 is renamed to PAY_1 to keep the naming consistent with the other features, and default payment next month is re-named to default to reduce verbosity of the feature name (I didn't want to deal with a feature that had spaces in its name).

I may be confused, but there's a lot of references to "loans" but, if I understand correctly, the dataset is of people paying off monthly credit card bills (or defaulting on them).

I changed the "story" of the case study to be predicting loan default rates using the monthly credit card payment dataset, where we use failure to make monthly card payments as a sign of poor "creditworthiness". Hopefully, this addresses the discrepancy between the two tasks.

What does the term "limit" for the synthetic feature refer to? Is this a commonly used term for this that readers will be familiar with? (and is this a common way of defining it? e.g., using a Gaussian distribution with those criteria)

I changed the synthetic feature to represent Interest rates. The story behind the feature is now that people who have defaulted on credit card payments are given higher interest rates, and there is more variance in the interest rates allocated to women compared to men.

LeJit avatar Jun 02 '22 20:06 LeJit

Codecov Report

:exclamation: No coverage uploaded for pull request base (main@504804b). Click here to learn what that means. The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #1008   +/-   ##
=======================================
  Coverage        ?   94.12%           
=======================================
  Files           ?       52           
  Lines           ?     2194           
  Branches        ?        0           
=======================================
  Hits            ?     2065           
  Misses          ?      129           
  Partials        ?        0           
Flag Coverage Δ
unittests 94.12% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 504804b...54c2c59. Read the comment docs.

codecov[bot] avatar Jun 02 '22 20:06 codecov[bot]

I agree that we should discourage creating new notebooks in .ipynb, but I'm viewing this specific PR as a replacement of an existing notebook, so I'm okay with it being .ipynb. This is being linked to from our landing page, so I'm in favor of expediency here.

Then I'd like to remove the old one in this PR and add a new .py file instead.

adrinjalali avatar Jun 08 '22 09:06 adrinjalali

I agree that we should discourage creating new notebooks in .ipynb, but I'm viewing this specific PR as a replacement of an existing notebook, so I'm okay with it being .ipynb. This is being linked to from our landing page, so I'm in favor of expediency here.

Then I'd like to remove the old one in this PR and add a new .py file instead.

I've opened #1108 to discuss this point...

MiroDudik avatar Jun 10 '22 18:06 MiroDudik

@MiroDudik I made all the changes you requested. You can take another look to see if there is anything else I need to change. I also deleted the old notebook.

I'll add the new unit tests in a later commit.

LeJit avatar Jun 10 '22 19:06 LeJit

Thank you @romanlutz

Of course! For context, we discussed this in the community call last week. Since there were quite a few ipynb-to-py (and md to rst) peculiarities that needed addressing I volunteered to take care of it as I've done this a few times. I suppose I should have provided a full page screenshot to make this easier to review: screencapture-file-C-Users-Roman-git-fairlearn-docs-build-html-auto-examples-plot-credit-loan-decisions-html-2022-12-19-11_42_18 screencapture-file-C-Users-Roman-git-fairlearn-docs-build-html-auto-examples-plot-credit-loan-decisions-html-2022-12-19-11_42_18-2

... and I already found a couple of $ signs that I hadn't converted yet. 😆

romanlutz avatar Dec 19 '22 16:12 romanlutz

@LeJit why is seaborn in the dependencies? I don't see it used.

romanlutz avatar Dec 19 '22 17:12 romanlutz

@LeJit why is seaborn in the dependencies? I don't see it used.

@romanlutz There were some things with visualizing and comparing error rates across different models at the end that Miro and I wanted to, but we eventually decided to cut. You can remove the seaborn dependency and I believe the CalibrationClassifierCV import as well.

LeJit avatar Dec 19 '22 18:12 LeJit

I'll try to get to it today. I have a bunch of tiny fixes locally as well. I suspect the linter will make me do a whole lot of fixes, too.

romanlutz avatar Dec 21 '22 17:12 romanlutz

You may want to merge PR #1163 to make the linter work again.

MiroDudik avatar Dec 21 '22 22:12 MiroDudik

This is now up-to-date with latest main and I've fixed all flake8 issues. The remaining open issues afaict:

  • we're not using fetch_credit_card because the column names are off (see corresponding comment)
  • we're not using plot_model_comparison

These seem like easily fixable problems in small follow-up PRs that don't affect the example in terms of validity. If either of you (@MiroDudik, @LeJit ) want to take a stab at them please feel free to push your updates directly. I've done the part that I had promised 😄

romanlutz avatar Dec 23 '22 02:12 romanlutz

I think it would be better to make those changes in follow-up PRs. I really can't think of any way to identify the Pareto dominated inner models that doesn't require us to compute the balanced_accuracy and equalized_odds for each of the inner models, so we're always going to have to do that inner models sweep.

Also, if we want an example using plot_model_comparison, I personally think the SciPy/PyCon tutorial would be a better place to incorporate that method since we're not filtering dominated classifiers in that example. What do you think @MiroDudik ?

LeJit avatar Dec 23 '22 19:12 LeJit

i tried to deal with the issue of crashing build of the website, but the issue is that the sphinx gallery simply takes too long to build on CircleCI...

UPDATE: I doubled the webpage build time to 60min, but it's still timing out.

MiroDudik avatar Jan 05 '23 00:01 MiroDudik