idn-treebank
idn-treebank copied to clipboard
What is the meaning of `=` in `SBAR=2`?
I have been trying to list all possible tagsets from IDN Treebank. Here is what I got:
{'-LRB-', 'SBARQ', '-RRB-', 'MD', 'SQ', '*EXP*-3', 'SINV-TPC-1', 'ADVP-PRD', ' NP-SBJ', 'PR', 'NP', 'S=1', 'WHADVP', 'SBAR', 'PP-3', 'SYM', 'S-NOM', 'PP-2', 'DT', 'UCP-PRD', 'SBAR-2', 'NND', 'NP-SBJ-4', '*EXP*-1', 'S-1', 'NP-2', 'UH', 'WHNP-1', 'NP-TPC-2', 'NEG', 'NP-1', 'NP-LGS', 'VP', '*-4', 'NP-TPC-1', 'NP-SBJ', 'FW', 'UCP-1', 'PP-SBJ', '*T*-4', 'S-2', '*T*-1', 'S-SBJ', 'Z', 'NP-SBJ=2', 'S-ADV', 'NP=2', 'S-NOM-SBJ', '*RNR*-2', 'NAC-TMP', 'CC', 'NP-3', 'JJ', 'NP-SBJ-1', 'SBAR-NOM', 'SBAR-SBJ', '*-1', 'WH', 'SBAR-PRD', 'SBAR-NOM-SBJ', '*', '*?*', 'S', 'X', 'NP-TTL', 'PP=2', 'QP', 'PP=3', 'SC', '*-5', 'NN', 'PP-PRD', 'CD', '*U*', '*EXP*-2', 'NP-SBJ-2', 'NP-SBJ-6', 'ADVP', 'S-PRD', 'NP=3', 'NP-SBJ-5', '*-6', 'UCP', 'S-TPC-1', 'PRN', 'RB', 'NNP', 'NP-TPC-4', 'IN', '*-3', 'SQ-PRD', 'NP-ADV', 'ADVP=3', '*-2', 'VB', 'NP-TTL-SBJ', 'SBAR-TPC-1', '*T*-2', 'PRP', 'RP', 'NP-SBJ-3', 'NP-TMP', 'ADVP-3', 'UCP-TPC-1', 'S-TTL', 'ADJP', 'SBAR-NOM-SBJ-1', 'S-TPC-2', 'SBAR=2', 'OD', 'ADJP-PRD', 'INTJ', 'CONJP', 'FRAG', 'SINV', 'NP-PRD', 'SINV-1', 'NP-SBJ=1', 'NP=1', 'PP'}
There are several tagset in which using =
instead of -
. What does it mean? Is it a typo or intentional? I can't find any information regarding this matter both in Penn Treebank paper and in IDN Treebank bracketing guideline.