Selector icon indicating copy to clipboard operation
Selector copied to clipboard

[Distance module] Add Formulas to Docstrings and Add Tests

Open FarnazH opened this issue 2 years ago • 3 comments

Finalize the functions in distance module by:

  1. Unifying docstrings and adding formulas (that match the implementation)
  2. Add tests for each function (to cover all lines) and tricky cases
  3. Polish code to make sure it is readable (e.g., add comments) and pythonic
### Tasks
- [x] Move `sklearn_supported_metrics` from `utils` module to `distance` module
- [x] Move tests related to distance module into `test_distance.py`
- [x] Remove support for `sklearn.pairwise_distances` function as it only adds overhead to the code.
- [x] Add formulas
- [x] Add tests
### Tasks - Post PR#129
- [x] Test coverges: missing lines 184 and 189.
- [x] Fix math formula: https://github.com/theochem/DiverseSelector/blob/main/DiverseSelector/distance.py#L149
- [x] Having both `pairwise_similarity_bit` and `compute_distance_matrix` seems redundant considering the fact that we have the #123 module.
- [x] Remove empty lines at the end of the modules, like https://github.com/theochem/DiverseSelector/blob/main/DiverseSelector/test/test_distance.py#L99
- [x] Add tests to explicitly test `tanimoto` and `modified_tanimoto`
- [x] Implementation needs to be improved and further clarified: https://github.com/theochem/DiverseSelector/blob/main/DiverseSelector/distance.py#L223
- [x] Needs more clarification in docstring and comments of https://github.com/theochem/DiverseSelector/blob/main/DiverseSelector/distance.py#L143

FarnazH avatar Jun 03 '23 15:06 FarnazH

I leave a note here, as this PR #137 is already merged. Before this issue can be closed:

  • The modified_tanimoto still needs to be finalized by @FarnazH.
  • The module name should be changed as it contains similarity measures, not distances.
  • Shouldn't the nearest_average_tanimoto be moved to the diversity.py module?

FarnazH avatar Jul 09 '23 13:07 FarnazH

One more thing we need to have is adding explanations for MaxMin and OptiSim as discussed in #119.

FanwangM avatar Jul 13 '23 02:07 FanwangM

Yes, this is a diversity measure. I agree that it should be moved.

  • Shouldn't the nearest_average_tanimoto be moved to the diversity.py module?

PaulWAyers avatar Jul 13 '23 19:07 PaulWAyers