RTX
RTX copied to clipboard
Change how we handle SemMedDB edges in ranking
SemMedDB seems to be returning lots of odd edges and bad results that get pushed higher in rankings.
Lots of potential options to address this:
- [x] Use subject and object confidence scores in ranking
- [ ] Use subject and object novelty in ranking
- [x] Condense SemMedDB edges into one edge
- [ ] SemMedDB antonym handling
Should look into averaging semeddb edge publication counts using:
- harmonic mean
- geometric mean
- median
- arithmetic mean
- L-infinity
On branch issue1695. Should test out the different averaging methods when combining multiple SemMedDB edges and see which ones we like. Issue #1684 is needed for the other items
Closing as @mfl15 's approach for filtering SemMedDB will likely fix this issue