pydelphin parsing TFS from tokens

trafficstars

Do we have any method to parse the TFS from tokens?

token [
+FORM "cats"
+FROM "4"
+TO "8"
+ID *diff-list* [ LIST *cons* [ FIRST "1" REST *list* ] LAST *list* ]
+TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG "NNS" +PRB "1.0" ] ]
+CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL - ]
+TRAIT token_trait [ 
 +UW -
 +IT italics
 +LB bracket_null [ LIST *list* LAST *list* ]
 +RB bracket_null [ LIST *list* LAST *list* ]
 +LD bracket_null [ LIST *list* LAST *list* ]
 +RD bracket_null [ LIST *list* LAST *list* ]
 +HD token_head [ +TI "<4:8>"
   +LL ctype [ -CTYPE- string ]
   +TG string ] ]
+PRED predsort
+CARG "cats"
+TICK +
+ONSET c-or-v-onset ]

Aug 06 '22 23:08 arademaker

Do we have any method to parse the TFS from tokens?

please see lkb::read-dag() in

http://svn.delph-in.net/lkb/trunk/src/glue/dag.lsp

in [incr tsdb()] this is invoked by tsdb::reconstruct(), which will recreate the full feature structure associated with the derivation, including any information 'infused' into the lexical entries from the underlying token feature stuctures, e.g. characterization.

Aug 07 '22 06:08 oepen

Hi @goodmami and @oepen,

[t.to_dict() for t in result.derivation().preterminals()]
[{'entity': 'the_1',
  'id': 149,
  'score': -1.639588,
  'start': 0,
  'end': 1,
  'type': 'd_-_the_le',
  'form': 'the',
  'tokens': [{'id': 91,
    'tfs': 'token [ +FORM \\"the\\" +FROM \\"0\\" +TO \\"3\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"0\\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \\"DT\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL + ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<0:3>\\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"the\\" +TICK + +ONSET c-or-v-onset ]'}]},...

So I tried the LKB code with the string from the tfs field above, am I right @oepen ?

LKB> (read-dag "token [ +FORM \"the\" +FROM \"0\" +TO \"3\" +ID *diff-list* [ LIST *cons* [ FIRST \"0\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \"DT\" +PRB \"1.0\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL + ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \"<0:3>\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \"the\" +TICK + +ONSET c-or-v-onset ]")
NIL

Aug 12 '22 22:08 arademaker

@arademaker addressing your initial question: no, I don't think I ever got around to adding support for parsing those token structures, but I had thought about it. The delphin.tfs.TypedFeatureStructure class should be capable of containing it once it's parsed, but this TFS format is slightly different from TDL (notice, e.g., there's no commas between feature values), so we can't just use the TDL parser.

Aug 13 '22 02:08 goodmami

So I tried the LKB code with the string from the tfs field above,

do you have the right grammar loaded? recreating the token feature structure requires the type hierarchy and constraints available, i.e. a complete unifier.

Aug 13 '22 06:08 oepen

pydelphin pydelphin copied to clipboard

parsing TFS from tokens

pydelphin
pydelphin copied to clipboard