[Diversity module] Add Formulas to Docstrings, Add Tests, & Polish Module
See: https://github.com/theochem/DiverseSelector/issues/121
For each function (mentioned in #121), make sure that:
- [x] implementation is clear with comments.
- [x] docstring is polished and gives the formula implemented (add a reference to the paper used for implementation)
- [ ] tests have 100% coverage
While trying to write documentation for this module, these are some of the implementation errors I ran into:
-
the logdet function, as currently implemented according to this source, requires a features x mols matrix, while elsewhere in the package our standard is a mols x features matrix. Do I change the implementation or just note this in the docs?
-
shannon_entropy() is not correctly implemented, and does not match the cited paper/equation. There is a comment to add a raise error, but reading through the cited paper I see no reason why there would be a value error in its calculation. Do you want me to rewrite this function?
-
total_diversity_volume is also not implemented according to its paper. The function currently returns the degree of overlap between molecule neighborhoods, not the total diversity volume. I think we should rename the function instead of altering the implementation, as in the paper, total diversity volume does not describe a sub-set of data points, but the entire input space.
-
The Wasserstien distance function wdud() needs to be rewritten
Post - PR #135:
- [x] 93% Test Coverage (missing lines are 75-78, 240, 249, 314, 319, 384)
- [x] Fix formula https://github.com/theochem/DiverseSelector/blob/main/DiverseSelector/diversity.py#L86
- [x]
gini_coefficient: Rename argument nameatoxto be consistent with other functions - [x]
hypersphere_overlap_of_subset: Renamelibandxintoxandx_subset - [x] Remove redundant arguments https://github.com/theochem/DiverseSelector/blob/main/DiverseSelector/diversity.py#L57
- [x]
total_diversity_volumewas renamed, but it was not changed everywhere. The tests fail as a result of this! https://github.com/theochem/DiverseSelector/blob/main/DiverseSelector/diversity.py#L45