pydelphin icon indicating copy to clipboard operation
pydelphin copied to clipboard

parsing TFS from tokens

Open arademaker opened this issue 3 years ago • 4 comments
trafficstars

Do we have any method to parse the TFS from tokens?

token [
+FORM "cats"
+FROM "4"
+TO "8"
+ID *diff-list* [ LIST *cons* [ FIRST "1" REST *list* ] LAST *list* ]
+TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG "NNS" +PRB "1.0" ] ]
+CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL - ]
+TRAIT token_trait [ 
 +UW -
 +IT italics
 +LB bracket_null [ LIST *list* LAST *list* ]
 +RB bracket_null [ LIST *list* LAST *list* ]
 +LD bracket_null [ LIST *list* LAST *list* ]
 +RD bracket_null [ LIST *list* LAST *list* ]
 +HD token_head [ +TI "<4:8>"
   +LL ctype [ -CTYPE- string ]
   +TG string ] ]
+PRED predsort
+CARG "cats"
+TICK +
+ONSET c-or-v-onset ]

arademaker avatar Aug 06 '22 23:08 arademaker

Do we have any method to parse the TFS from tokens?

please see lkb::read-dag() in

http://svn.delph-in.net/lkb/trunk/src/glue/dag.lsp

in [incr tsdb()] this is invoked by tsdb::reconstruct(), which will recreate the full feature structure associated with the derivation, including any information 'infused' into the lexical entries from the underlying token feature stuctures, e.g. characterization.

oepen avatar Aug 07 '22 06:08 oepen

Hi @goodmami and @oepen,

[t.to_dict() for t in result.derivation().preterminals()]
[{'entity': 'the_1',
  'id': 149,
  'score': -1.639588,
  'start': 0,
  'end': 1,
  'type': 'd_-_the_le',
  'form': 'the',
  'tokens': [{'id': 91,
    'tfs': 'token [ +FORM \\"the\\" +FROM \\"0\\" +TO \\"3\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"0\\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \\"DT\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL + ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<0:3>\\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"the\\" +TICK + +ONSET c-or-v-onset ]'}]},...

So I tried the LKB code with the string from the tfs field above, am I right @oepen ?

LKB> (read-dag "token [ +FORM \"the\" +FROM \"0\" +TO \"3\" +ID *diff-list* [ LIST *cons* [ FIRST \"0\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \"DT\" +PRB \"1.0\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL + ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \"<0:3>\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \"the\" +TICK + +ONSET c-or-v-onset ]")
NIL

arademaker avatar Aug 12 '22 22:08 arademaker

@arademaker addressing your initial question: no, I don't think I ever got around to adding support for parsing those token structures, but I had thought about it. The delphin.tfs.TypedFeatureStructure class should be capable of containing it once it's parsed, but this TFS format is slightly different from TDL (notice, e.g., there's no commas between feature values), so we can't just use the TDL parser.

goodmami avatar Aug 13 '22 02:08 goodmami

So I tried the LKB code with the string from the tfs field above,

do you have the right grammar loaded? recreating the token feature structure requires the type hierarchy and constraints available, i.e. a complete unifier.

oepen avatar Aug 13 '22 06:08 oepen