link-grammar icon indicating copy to clipboard operation
link-grammar copied to clipboard

dictionary_lookup_list does not look up regexes!

Open linas opened this issue 8 years ago • 5 comments

The dictionary_lookup_list() was added to the public dictionary API, so that other users (specifically, the sureal (surface realization) and microplanning modules for sentence-generation) could look up words in the LG dictionary. And that's mostly fine, except that it does not lookup regexes. So:

-- the LG internals need to be jiggered around, so that the public API does lookup regexes.

A related issue is that the public API exposes the Exp structure, which has a bizarre design. It needs to be reworked so that it's cleaner, nicer for the ordinary user. Unfortunately, this is a lot of work.

linas avatar Aug 12 '17 04:08 linas

Do you mean something like that (and the same for db_lookup_list()):

Dict_node * file_lookup_list(const Dictionary dict, const char *s)
{
	Dict_node * llist =
		rdictionary_lookup(NULL, dict->root, s, true, dict_order_bare);
	llist = prune_lookup_list(llist, s);
	if (NULL != llist) return llist;

	const char *regex_name = match_regex(dict->regex_root, s);
	if (regex_name) return file_lookup_list(dict, regex_name);

	return NULL;
}

Regarding expression format, what would be considered a nice format? Is ASCII representation like expression_stringify() better than the current C structure?

ampli avatar Oct 04 '17 02:10 ampli

Do you mean something like that

Yes.

stringify()

Yes, that would probably be best. Returning a string that resembles the current ascii dictionary format would be best. I don't recall exactly what expression_stringify() prints.

linas avatar Oct 04 '17 04:10 linas

I don't recall exactly what expression_stringify() prints.

It prints the expression in the current dictionary format. So the question now is how to implement it.

I guess we will need a new API. The easiest way is seems to have 2 API functions, something like: dictionary_lookup_words() # Return list of words dictionary_lookup_exp() # Return list of corresponding expressions Or maybe we can have one function dictionary_lookup() that returns a list word1, exp1, word2, exp2, ... etc... One question is whether we need to add a third component that indicates if the word has been resolved through a regex.

Or maybe we can use a JSON format, which can be extendable in a compatible way for any future need, and API users can use a JSON library function to decode it, if desired. We can use such a JSON API as a future model for some additional APIs we still need to add.

ampli avatar Oct 04 '17 09:10 ampli

JSON format

Yes, I like that best! Some of the current API's could/should be provided in json.

linas avatar Nov 08 '17 17:11 linas

Proposal:

const char *linkgrammar_get_dict_word(Dictionary dict, const char *word);

JSON example:

 {
   "numentries": 5,
   "entries": 
     [
       {
         "word": "word1.s2",
         "regex-name": null,
         "idiom": false,
         "expr": "(((dWV- or dCV- or dIV-) & {VC+}) or [()])"
       },
       {
         "word": "word2.s3",
          ...
       }
    ]
 }

I don't know if extending it this way may be useful:

  "base": "word1"
  "subscript": "s2"
  "dnf-expr": { {"cost": 0, "expr":  [  "A-", "B+", "C+"]},  {"cost": 2, "expr": [  "D-"]},  ...} }
(or specify the - and + connectors to different arrays.)

ampli avatar Mar 22 '18 09:03 ampli