reach
reach copied to clipboard
Fairness review on deep reference parser algorithm
Tasks:
- [x] Discuss and decide fairness criteria.
- [ ] Conduct fairness review.
- [ ] Report to team.
The deep reference parser needs to undergo a fairness review. Before this can happen we need to answer the following questions:
How we define fairness in this case?
- Do we care about treating people the same across different groups?
- What are the groups that we think are important if any?
- Or do we care about treating every single the individual the same?
How does our definition translate to the testing for fairness in the algorithm?
- Which metric(s) are most appropriate for our case?
I think the analysis should be the same as we have done so far, just replication using the new model. We should aim to do it end to end but due to the limited data we might find that we need to annotate more before we are able to in which case we might decide to postpone that for the future and just literally replicate on the existing data.
Why would we need to label more data @nsorros? Btw I said to @aoifespenge today that I envisage this being another airflow task that is completed at the end of a dag just like the end-to-end evaluation for the more usual metrics. Is that what you had in mind?
Why would we need to label more data @nsorros? Btw I said to @aoifespenge today that I envisage this being another airflow task that is completed at the end of a dag just like the end-to-end evaluation for the more usual metrics. Is that what you had in mind?
Not a bad idea, we can definitely have it in Airflow as well. Even though the analysis would only change when a new model is deployed.
We would need to label more data because how would you quantify whether the algorithm is biased towards sociology or non english publication if none of your data contains either?
Sorry I guess my question was more, which data should we label more of?
I think it would make sense to run the ethical assessment after each update to reach, not just the model, because an improvement to the scraper, or adding a new provider could equally have an impact on the fairness of the whole pipeline.
good point and why not. the data that might need more annotating is the gold data, more titles that are matched to pubmed ids and have the neccesary metadata.
Agree. More data is good data.
This is currently blocked by https://github.com/wellcometrust/reach/issues/48