[Docs] support of different metrics for binary fingerprints or descriptors
Some metrics only work on binary fingerprints, such as Tanimoto. But some metrics can work for both cases and some metrics work for only for non-binary matrix, such as molecular descriptors. Can you give a list or table to summary this information? We will need this to refactor the metrics module. Thank you.
@Khaleeh
With @Khaleeh we agreed that when non-binary data was given to a binary descriptor, we would apply it "blindly" using the way Python folds numerical data into logical data, namely False = 0 and True = [anything except zero] => 1. The idea is that the binary fingerprint of non-binary data is "this property is "true" unless it is zero." That seems sensible in many cases. E.g., a descriptor n_feature which counts then number of time feature occurs is False if that feature never occurs, and is True if it occurs one or more times.
euc_bit is binary.
Tanimoto is non-binary (I think this also works with binary but it should be confirmed).
bit_tanimoto is binary.
modified_tanimoto is binary (the non-bit to bit converter can be added).
entropy is best for binary but has a converter as @PaulWAyers mentioned. (looking at this again it might be good to add if (max(map(max, x))) > 1: in the beginning, so that it only runs when it's a non-binary matrix).
nearrest_average_tanimoto and explicit_diversity_index are binary (untested method).
logdet is non-binary.
shannon_entropy is binary.
wdud is non-binary (not done? not tested).
total_diversity_volume is binary.
entropy and logdet seem to be the best.
Document the True and False elements with more details.
Add more docstrings for this to make it clean to the users. This is related to #121.
I think it's clear with respect to https://github.com/theochem/Selector/issues/128.