blocklib
blocklib copied to clipboard
Ideas for extra signature strategies
Possible extra strategies and set of names for signature generation:
Existing strategies
- ExactCharMatchSig: The letter at given index. (implemented as
generate_by_char_at
) - ExactMatchSig: the value of the whole field (implemented as
generate_by_feature_value
) - WordSoundSimilarSig: comparison of the sound of the word using metaphone. (implemented as
generate_by_metaphone
)
New strategies
- FirstWordSig: the first word of the field
- LastWordSig: the last word of the field
- InitialLastWordSig: the first letter and last word of the field (e.g. for formatted names)
- AnyWordSig: any of the words in field
- WordNGramsSig: n-grams of words extracted from a text field
- LetterNGramsSig: n-grams of letters extracted from a text field
- LastNWordsSig: the last n words of the field
- FirstNWordsSig: the first n words of the field
- ArrayCombinationSig: all n-grams of an array field