dkpro-similarity icon indicating copy to clipboard operation
dkpro-similarity copied to clipboard

FunctionWordFrequenciesMeasure is hard coded for English

Open nicolaierbs opened this issue 9 years ago • 1 comments

Original issue 1 created by dkpro on 2012-06-21T08:19:29.000Z:

The function word list in this measure is hard coded for English. It should be changed so that it could also be used for other languages.

nicolaierbs avatar Mar 30 '15 23:03 nicolaierbs

Comment #1 originally posted by dkpro on 2012-07-10T08:32:29.000Z:

Added a parameter in order to be able to provide an own list of function words.

The core of the issue however is still untouched, as the English function word list is used as default.

The measure should check for the CAS language and then try to load the corresponding default list, or throw initialization exception if no such list could be found.

nicolaierbs avatar Mar 30 '15 23:03 nicolaierbs