Filter top K edges also based on the count of shared peaks (and not only cosine)
Is your feature request related to a problem? Please describe. My feature request is not related to any problem, just a suggestion
Describe the solution you'd like I would like to be able to:
- Sort the edges by the descending order of matched peaks just as for cosine scores
- Eventually, filter the top K based on this order, or a mix of this order and the order given by the cosine scores
Given this original table:
| source | target | cosine | shared peaks |
|---|---|---|---|
| 1 | 2 | 0.90 | 6 |
| 1 | 3 | 0.89 | 97 |
| 1 | 4 | 0.88 | 8 |
| 1 | 5 | 0.87 | 9 |
| 1 | 6 | 0.86 | 10 |
| 1 | 7 | 0.85 | 7 |
| 1 | 8 | 0.84 | 110 |
| 1 | 9 | 0.83 | 100 |
The currently resulting filtered table is (for a top K = 5, min cosine = 0.6, min shared peaks = 0.6):
| source | target | cosine | shared peaks |
|---|---|---|---|
| 1 | 2 | 0.90 | 6 |
| 1 | 3 | 0.89 | 97 |
| 1 | 4 | 0.88 | 8 |
| 1 | 5 | 0.87 | 10 |
| 1 | 6 | 0.86 | 9 |
What I would like to have is (weights can discussed)
| source | target | cosine | shared peaks | rank cosine | rank shared peaks | final rank (here 50:50) |
|---|---|---|---|---|---|---|
| 1 | 2 | 0.90 | 6 | 1 | 8 | 3 (sum = 9) |
| 1 | 3 | 0.89 | 97 | 2 | 3 | 1 (sum = 5) |
| 1 | 4 | 0.88 | 8 | 3 | 6 | 3 (sum = 9) |
| 1 | 5 | 0.87 | 10 | 4 | 4 | 2 (sum = 8) |
| 1 | 6 | 0.86 | 9 | 5 | 5 | 4 (sum = 10) |
| 1 | 7 | 0.85 | 7 | 6 | 7 | 5 (sum = 13) |
| 1 | 8 | 0.84 | 110 | 7 | 1 | 2 (sum = 8) |
| 1 | 9 | 0.83 | 100 | 8 | 2 | 4 (sum = 10) |
resulting finally in the following filtered table:
| source | target | cosine | shared peaks | rank cosine | rank shared peaks | final rank (here 50:50) |
|---|---|---|---|---|---|---|
| 1 | 3 | 0.89 | 97 | 2 | 3 | 1 (sum = 5) |
| 1 | 5 | 0.87 | 10 | 4 | 4 | 2 (sum = 8) |
| 1 | 8 | 0.84 | 110 | 7 | 1 | 2 (sum = 8) |
| 1 | 2 | 0.90 | 6 | 1 | 8 | 3 (sum = 9) |
| 1 | 4 | 0.88 | 8 | 3 | 6 | 3 (sum = 9) |
Hope this makes sense, happy to elaborate if needed! 😊
Additional context Code that would need to be modified is https://github.com/mwang87/GNPS_sharedcode/blob/8283c5ce154eda266b4e5fce8747845fa0314d08/molecular_network_filtering_library.py#L383
I think the goal here is to create some normalized score that has some idea of equal reliability. This has a lot of parallels to this:
https://pubs.acs.org/doi/10.1021/pr400230p
We've done some work to find out equivalences in the small molecule space a few years ago, maybe this is a good collaboration we can publish together on.