python-crfsuite icon indicating copy to clipboard operation
python-crfsuite copied to clipboard

Add support for word embedding like features which are list of floats

Open napsternxg opened this issue 9 years ago • 7 comments

The current API doesn't support adding features which are list of floats e.g. Word Embeddings. The current approach to add these features is to do something like {"f0": 1.5, "f1": 1.6, "f2": -1.4} for 3 dimensional embedding features, which adds extra burden on the user's part.

I propose a wrapper feature which will allow users to pass the word embedding list as the value of the dictionary. E.g. {"f": FloatFeatures([1.5, 1.6, -1.4])}, internally this will convert the float features into a representation consistent with the CRFSuite ItemSequence and having a consistent naming convention like "f:0", "f:1", "f:2".

napsternxg avatar Jun 10 '16 22:06 napsternxg

@kmike and @tpeng do you want to have a look at it?

napsternxg avatar Jun 10 '16 22:06 napsternxg

Using word embeddings improve accuracy a lot. Having a supported way to include them in python-crfsuite would be wonderful.

EmilStenstrom avatar Jan 05 '18 07:01 EmilStenstrom

@napsternxg any updates on feeding float vectors as features? i have the same situation where i want to use glove embeddings for a NER task using crf.

muhXnash avatar Jul 08 '19 15:07 muhXnash

@muhnash0 I basically did the proposed approach in my comment manually. It was quite easy.

napsternxg avatar Jul 19 '19 19:07 napsternxg

I don't think the proposed approach will work. CRFsuite does not support continuous features so each unique key/value combination will be a unique feature. You have to discretize the continuous features with a technique like https://arxiv.org/abs/1711.01068

DomHudson avatar Mar 18 '20 11:03 DomHudson

@DomHudson crfsuite does support continuous features

kmike avatar Mar 18 '20 12:03 kmike

The approach I suggested is utilized in this tool I have built.

https://github.com/napsternxg/TwitterNER

napsternxg avatar Mar 18 '20 13:03 napsternxg