gateplugin-LearningFramework icon indicating copy to clipboard operation
gateplugin-LearningFramework copied to clipboard

Implement additional feature functions, like wordshape and character n-grams

Open johann-petrak opened this issue 8 years ago • 4 comments

Implement some standard feature functions: wordshape, character ngrams with maximum n or range of ns, prefixes or suffixes of length <= n. Where these features should also be usable in windows (ATTRLIST). In theory we could create these beforehand to be separate per instance features, but having the feature generation code do this is more convenient. A more complex functionality would be generating certain features only for rare instances (so making use of a pre-computed frequency table) (see Curran etal 2003, Language Independent NER Using a Maximum Entropy Tagger) Try to be compatible or create some features similar to what the Stanford feature factory does: http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ie/NERFeatureFactory.html

johann-petrak avatar Apr 06 '16 18:04 johann-petrak

If the feature function is an actual function of a value that is easily available from the original annotation, the function should be implemented as a static method that can be used from any client code. That way the feature function can be (pre)calculated in a separate step. However, some of the values from which to calculate the function may only be as a result of feature extraction, so we need a way to specify all these functions in the attribute definition.

johann-petrak avatar Jun 20 '16 16:06 johann-petrak

Since LF can now make use of list and set and map valued features, even character ngrams can be pre-calculated, though doing it this way will probably blow up the memory required for each document considerably.

johann-petrak avatar Jul 13 '16 13:07 johann-petrak

See also #48

johann-petrak avatar Aug 10 '17 09:08 johann-petrak

Character ngrams should get calculated on the fly, specified by something lik <CHARNGRAM><NFROM>2</NFROM><NTO>4</NTO><ADDSTARTSTOP/></CHARNGRAM>

johann-petrak avatar Aug 10 '17 09:08 johann-petrak