jitar
jitar copied to clipboard
(Enhancement) Add "hasCapitalizationInfo" option
jitar seems to be tagset agnostic except for one line in HMMTagger which assumes that tags are prefixed with the capitalization info added by the FrequenciesCollector :
String tag = d_numberTags.get(tagNumber).substring(2);
An option to trim the first 2 characters or not would allow an arbitrary tagset to be used as well as supporting models created in earlier jitar versions.