ingredient-phrase-tagger icon indicating copy to clipboard operation
ingredient-phrase-tagger copied to clipboard

CRF Output

Open prakhar21 opened this issue 8 years ago • 2 comments

Hi, I am not able to understand to what does these tab separated fields mean.

1            I1      L8      NoCAP  NoPAREN  B-QTY
cup          I2      L8      NoCAP  NoPAREN  B-UNIT
white        I3      L8      NoCAP  NoPAREN  B-NAME
wine         I4      L8      NoCAP  NoPAREN  I-NAME

Please, help me out.

Thanks

prakhar21 avatar Jul 15 '16 12:07 prakhar21

@prakhar21 Those are a list of the tokens (words) and the associated features. The associated code is here. The on the right is the tag that we're trying to predict.

Does that answer your question?

ericagreene avatar Jul 19 '16 19:07 ericagreene

@ericagreene Thanks, that answers my question. There is one more thing that, I wanted to clarify. When I am training on all 180k data and then using my own dataset as validation then, why is it like the predictions that it made with 20k data model are more accurate compared to 180k data model. This is against model training principles. My understanding says, more data is always good for training purpose. Please, share your thoughts on this.

Thanks

prakhar21 avatar Jul 26 '16 06:07 prakhar21