crfsuite icon indicating copy to clipboard operation
crfsuite copied to clipboard

meaning of min_freq

Open marctorsoc opened this issue 7 years ago • 3 comments

Hi, was wondering what's the meaning of min_freq param, as in the documentation says it's a float. So I was always convinced it was a number in the range [0,1] (a percentage) but then I've seen other examples with e.g. =5.0

Is that then the absolute frequency of a feature? (e.g. the number of times a feature appears in the training data)

Is it a requirement for the entire training set or per document?

Thanks!

marctorsoc avatar Oct 26 '18 17:10 marctorsoc

According to the doc: "Cut-off threshold for occurrence frequency of a feature. CRFsuite will ignore features whose frequencies of occurrences in the training data are no greater than VALUE. The default value is 0 (i.e., no cut-off)."

The example: $ crfsuite learn -m CRF.model -p feature.minfreq=2 train.txt

A particular feature will be removed if it appears only once. Knowing how CRF works, it only makes sense at the dataset level.

Check the behavior by inspecting the model. Run the command to get the model in text format: $ crfsuite dump CRF.model > CRF.model.txt

A feature that appeared only once in train.txt shouldn't be in the model. I am assuming that you didn't set the c1 parameter to a non-zero value as it prunes features as well.

usptact avatar Oct 26 '18 17:10 usptact

Thank you very much, awesome answer

marctorsoc avatar Oct 27 '18 07:10 marctorsoc

Sorry not sure I understand. You say that they are omitted if the value is no greater than VALUE.

If for 0 the ones appearing once are not removed, for 2 the ones appearing twice are also removed?

Maybe it's when value is no greater or equal than VALUE? but then 1 and 0 would be the same...

marctorsoc avatar Oct 29 '18 11:10 marctorsoc