Selector icon indicating copy to clipboard operation
Selector copied to clipboard

[Docs] support of different metrics for binary fingerprints or descriptors

Open FanwangM opened this issue 3 years ago • 4 comments

Some metrics only work on binary fingerprints, such as Tanimoto. But some metrics can work for both cases and some metrics work for only for non-binary matrix, such as molecular descriptors. Can you give a list or table to summary this information? We will need this to refactor the metrics module. Thank you. @Khaleeh

FanwangM avatar May 12 '22 17:05 FanwangM

With @Khaleeh we agreed that when non-binary data was given to a binary descriptor, we would apply it "blindly" using the way Python folds numerical data into logical data, namely False = 0 and True = [anything except zero] => 1. The idea is that the binary fingerprint of non-binary data is "this property is "true" unless it is zero." That seems sensible in many cases. E.g., a descriptor n_feature which counts then number of time feature occurs is False if that feature never occurs, and is True if it occurs one or more times.

PaulWAyers avatar May 12 '22 19:05 PaulWAyers

euc_bit is binary. Tanimoto is non-binary (I think this also works with binary but it should be confirmed). bit_tanimoto is binary. modified_tanimoto is binary (the non-bit to bit converter can be added). entropy is best for binary but has a converter as @PaulWAyers mentioned. (looking at this again it might be good to add if (max(map(max, x))) > 1: in the beginning, so that it only runs when it's a non-binary matrix). nearrest_average_tanimoto and explicit_diversity_index are binary (untested method). logdet is non-binary. shannon_entropy is binary. wdud is non-binary (not done? not tested). total_diversity_volume is binary.

entropy and logdet seem to be the best.

Khaleeh avatar May 12 '22 23:05 Khaleeh

Document the True and False elements with more details.

FanwangM avatar May 13 '22 22:05 FanwangM

Add more docstrings for this to make it clean to the users. This is related to #121.

FanwangM avatar May 29 '23 08:05 FanwangM

I think it's clear with respect to https://github.com/theochem/Selector/issues/128.

FanwangM avatar Jun 25 '24 15:06 FanwangM