Selector icon indicating copy to clipboard operation
Selector copied to clipboard

[Diversity module] Add Formulas to Docstrings, Add Tests, & Polish Module

Open FarnazH opened this issue 2 years ago • 2 comments

See: https://github.com/theochem/DiverseSelector/issues/121

For each function (mentioned in #121), make sure that:

  • [x] implementation is clear with comments.
  • [x] docstring is polished and gives the formula implemented (add a reference to the paper used for implementation)
  • [ ] tests have 100% coverage

FarnazH avatar Jun 09 '23 19:06 FarnazH

While trying to write documentation for this module, these are some of the implementation errors I ran into:

  • the logdet function, as currently implemented according to this source, requires a features x mols matrix, while elsewhere in the package our standard is a mols x features matrix. Do I change the implementation or just note this in the docs?

  • shannon_entropy() is not correctly implemented, and does not match the cited paper/equation. There is a comment to add a raise error, but reading through the cited paper I see no reason why there would be a value error in its calculation. Do you want me to rewrite this function?

  • total_diversity_volume is also not implemented according to its paper. The function currently returns the degree of overlap between molecule neighborhoods, not the total diversity volume. I think we should rename the function instead of altering the implementation, as in the paper, total diversity volume does not describe a sub-set of data points, but the entire input space.

  • The Wasserstien distance function wdud() needs to be rewritten

AWBroscius avatar Jun 14 '23 13:06 AWBroscius

Post - PR #135:

  • [x] 93% Test Coverage (missing lines are 75-78, 240, 249, 314, 319, 384)
  • [x] Fix formula https://github.com/theochem/DiverseSelector/blob/main/DiverseSelector/diversity.py#L86
  • [x] gini_coefficient: Rename argument name a to x to be consistent with other functions
  • [x] hypersphere_overlap_of_subset: Rename lib and x into x and x_subset
  • [x] Remove redundant arguments https://github.com/theochem/DiverseSelector/blob/main/DiverseSelector/diversity.py#L57
  • [x] total_diversity_volume was renamed, but it was not changed everywhere. The tests fail as a result of this! https://github.com/theochem/DiverseSelector/blob/main/DiverseSelector/diversity.py#L45

FarnazH avatar Jun 20 '23 01:06 FarnazH