emnlp2017-bilstm-cnn-crf icon indicating copy to clipboard operation
emnlp2017-bilstm-cnn-crf copied to clipboard

Using handcrafted Numerical and Boolean features along with Text

Open karankapoor229 opened this issue 6 years ago • 1 comments

Can you please help me figure out as how to feed in Numerical and Boolean features along with the text. These features should be used as is.

So instead of putting in just the text like this:

Licence other No. other : other DL-8388568791506 B-id_no (P) other N other

I can provide OCR localization based features along with the text like

Licence None No. None XXXXXXX 1 7 9.667 0 0.002 0.014 0.019 0.0 0.0 0.269 0 1 0 0 7 517 518 0.167 6 0 0 other No. Licence : None XXX 1 7 9.667 9.667 0.002 0.014 0.019 0.019 0.2 0.269 4 0 0 0 3 517 518 0.333 6 0 0 other : No. DL-4941170078518 Licence X 1 7 9.667 9.667 0.002 0.014 0.019 0.019 0.333 0.269 5 0 0 0 1 517 518 0.5 6 0 0 other DL-8388568791506 : (P) No. XXX0000000000000 1 7 9.667 9.667 0.002 0.014 0.019 0.019 0.056 0.269 7 0 0 0 16 517 518 0.667 6 0 0 B-id_no (P) DL-4941170078518 N : XXX 1 7 9.667 9.667 0.002 0.014 0.019 0.019 0.2 0.269 13 0 0 0 3 517 518 0.833 6 0 0 other N (P) None DL-4941170078518 X 1 7 33 9.667 0.002 0.014 0.064 0.019 0.185 0.269 14 1 0 0 1 517 518 1.0 6 0 0 other

Any help or suggestion would be highly appreciable. Thanks

karankapoor229 avatar Aug 04 '18 06:08 karankapoor229

Hi, at the moment all features are mapped to embeddings. Features with a discrete (small) number of values would be no problem, for example boolean values. These can be mapped to embeddings as well.

For numeric fetures, you would need to extend the BiLSTM.py in line 110 - 118. Instead of adding an input + embedding layer for your numeric features, you would just add an input layer

nreimers avatar Aug 04 '18 10:08 nreimers