nlp4j
nlp4j copied to clipboard
Inequality symbols resulting in sentence boundaries (maybe)
nlp4j-english & nlp4j-api 1.1.3
, <, and >= symbols look like they are being used as sentence boundaries. = doesn't have this problem. please show revenue > five -> 2 root please show revenue < five -> 2 root please show revenue >= five -> 1 root please show revenue <= five -> 2 root, 1 root having a parent
Enter a test sentence: please show revenue > five
id word_form lemma pos_tag feat_map dependency semantic_heads nament_tag
0 @#r$% @#r$% @#r$% _ _:_ _ @#r$%
1 please please UH _ 2:discourse _ O
2 show show VB pos2=VBP 0:root _ O
3 revenue revenue NN _ 2:dobj _ O
4 > > : pos2=SYM 5:punct _ O
5 five #crd# CD _ 0:root _ U-CARDINAL
Enter a test sentence: please show revenue < five
id word_form lemma pos_tag feat_map dependency semantic_heads nament_tag
0 @#r$% @#r$% @#r$% _ _:_ _ @#r$%
1 please please UH _ 2:discourse _ O
2 show show VB pos2=VBP 0:root _ O
3 revenue revenue NN _ 5:compound _ O
4 < < HYPH pos2=: 5:punct _ O
5 five #crd# CD _ 0:root _ U-CARDINAL
Enter a test sentence: please show revenue >= five
id word_form lemma pos_tag feat_map dependency semantic_heads nament_tag
0 @#r$% @#r$% @#r$% _ _:_ _ @#r$%
1 please please UH _ 2:discourse _ O
2 show show VB pos2=VBP 0:root _ O
3 revenue revenue NN _ 2:dobj _ O
4 > > SYM pos2=-RRB- 6:punct _ O
5 = = SYM pos2=CC 6:punct _ O
6 five #crd# CD _ 3:nmod _ U-CARDINAL
Enter a test sentence: please show revenue <= five
id word_form lemma pos_tag feat_map dependency semantic_heads nament_tag
0 @#r$% @#r$% @#r$% _ _:_ _ @#r$%
1 please please UH _ 2:discourse _ O
2 show show VB pos2=VBP 0:root _ O
3 revenue revenue NN _ 2:dobj _ O
4 < < SYM pos2=XX 6:punct _ O
5 = = SYM pos2=CC 6:punct _ O
6 five #crd# CD _ **2:root** _ U-CARDINAL