BridgeDb icon indicating copy to clipboard operation
BridgeDb copied to clipboard

Database identifiers should be sorted by closest match

Open DeniseSl22 opened this issue 5 years ago • 0 comments

This issue has been raised in the PathVisio issue tracker by @egonw , but @mkutmon and me agree that it belongs here.

Currently, the class freeAttributeSearch is used in PV (which is a BridgeDb class) to search for free text (names of genes/proteins/compounds) in the locally loaded BridgeDb mapping files. There doesn't seem to be a good sorting of results (for example, looking for "TP53" first gives some names that are longer (but contain the phrase TP53), before the 'TP53' only string is given. This also happens for metabolites (see issue on PV). @ariutta suggested: " You could use Levenshtein distance."

This sorting should then happen in the results produced by the freeAttributeSearch (and will then automatically be displayed in that order by PV). Some example code on how to build your own custom comparator and one using the Levenshtein distance.

DeniseSl22 avatar May 01 '19 12:05 DeniseSl22