pydelphin
pydelphin copied to clipboard
parsing TFS from tokens
Do we have any method to parse the TFS from tokens?
token [
+FORM "cats"
+FROM "4"
+TO "8"
+ID *diff-list* [ LIST *cons* [ FIRST "1" REST *list* ] LAST *list* ]
+TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG "NNS" +PRB "1.0" ] ]
+CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL - ]
+TRAIT token_trait [
+UW -
+IT italics
+LB bracket_null [ LIST *list* LAST *list* ]
+RB bracket_null [ LIST *list* LAST *list* ]
+LD bracket_null [ LIST *list* LAST *list* ]
+RD bracket_null [ LIST *list* LAST *list* ]
+HD token_head [ +TI "<4:8>"
+LL ctype [ -CTYPE- string ]
+TG string ] ]
+PRED predsort
+CARG "cats"
+TICK +
+ONSET c-or-v-onset ]
Do we have any method to parse the TFS from tokens?
please see lkb::read-dag() in
http://svn.delph-in.net/lkb/trunk/src/glue/dag.lsp
in [incr tsdb()] this is invoked by tsdb::reconstruct(), which will recreate the full feature structure associated with the derivation, including any information 'infused' into the lexical entries from the underlying token feature stuctures, e.g. characterization.
Hi @goodmami and @oepen,
[t.to_dict() for t in result.derivation().preterminals()]
[{'entity': 'the_1',
'id': 149,
'score': -1.639588,
'start': 0,
'end': 1,
'type': 'd_-_the_le',
'form': 'the',
'tokens': [{'id': 91,
'tfs': 'token [ +FORM \\"the\\" +FROM \\"0\\" +TO \\"3\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"0\\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \\"DT\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL + ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<0:3>\\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"the\\" +TICK + +ONSET c-or-v-onset ]'}]},...
So I tried the LKB code with the string from the tfs field above, am I right @oepen ?
LKB> (read-dag "token [ +FORM \"the\" +FROM \"0\" +TO \"3\" +ID *diff-list* [ LIST *cons* [ FIRST \"0\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \"DT\" +PRB \"1.0\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL + ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \"<0:3>\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \"the\" +TICK + +ONSET c-or-v-onset ]")
NIL
@arademaker addressing your initial question: no, I don't think I ever got around to adding support for parsing those token structures, but I had thought about it. The delphin.tfs.TypedFeatureStructure class should be capable of containing it once it's parsed, but this TFS format is slightly different from TDL (notice, e.g., there's no commas between feature values), so we can't just use the TDL parser.
So I tried the LKB code with the string from the
tfsfield above,
do you have the right grammar loaded? recreating the token feature structure requires the type hierarchy and constraints available, i.e. a complete unifier.