link-grammar icon indicating copy to clipboard operation
link-grammar copied to clipboard

Dictionary definitions for user API

Open ampli opened this issue 3 years ago • 7 comments

dictionary_lookup_list() and dictionary_lookup_wild() are part of the API, and words that they return, if subscripted, have SUBSCRIPT_MARK in them. However, this symbol is currently not part of the API. Similarly, returned parses may include LEFT-WORD and RIGHT-WORD, which may need to be handled by the user. These symbols are defined in dict-defines.h, which is not part of the user API.

I encountered this problem when I added the rest of the API functions (those that are in dict-api.h and dict-structures.h) to the Python bindings. When I wanted to test dictionary_lookup_list() I found that SUBSCRIPT_MARK needs to be hard-coded. I already had such a problem in link-generator: https://github.com/opencog/link-grammar/blob/5cb9e68e553b6ba39f17b021cdd13b6c6105a414/link-parser/generator-utilities.c#L13-L15

Possible solutions:

  1. Continue to hard-code these constants.
  2. Provide a public .h file with definitions for constants that are used by the API. The file dict-api.h is not appropriate unless we would eventually like to use another include file for non-dictionary constants. It also contains irrelevant (to the API) definitions.
  3. Put the constants in the public API file that contains functions that use them, with a priority to link-includes.h.
  4. Implement a new API call linkgrammar_get_library_definition(const char *), when the argument may be strings like left-wall, 'max-word`, etc. But I think this is less desirable than (3).

I propose to implement (3). It is already implemented in link-includes.h for the lg_error_severity and ConstituentDisplayStyle enums, and also for LG_PANIC_DISJUNCT_COST. Similarly, LEFT_WALL_WORD and RIGHT_WALL_WORD can be moved to it. The SUBSCRIPT_* definitions can be moved to dict-api.h. It is not clear to me what to do with constants like MAX_WORD.

However, if (3) is implemented, there will be a need to include link-includes.h in files that need e.g. `LEFT_WORD

ampli avatar May 26 '21 00:05 ampli

why not just make dict-common/dict-defines.h part of the public API? That is, just install it, and let users use it.

linas avatar May 26 '21 15:05 linas

why not just make dict-common/dict-defines.h part of the public API? That is, just install it, and let users use it.

It currently also contains internal-only definitions:

static const double UNINITIALIZED_MAX_DISJUNCT_COST = -10000.0;
static const double DEFAULT_MAX_DISJUNCT_COST = 2.7;
/* We need some of these as literal strings. */
#define LG_DISJUNCT_COST                        "max-disjunct-cost"
#define LG_DICTIONARY_VERSION_NUMBER            "dictionary-version-number"
#define LG_DICTIONARY_LOCALE                    "dictionary-locale"

However, I guess an added comment can just indicate that (or maybe they can be moved to dict-common/dict-common.h).

In any case, I will include its move to the public API in the PR I am preparing (started as adding lg_exp_resolve(), but got extended, due to the need to test this function, to adding all the rest of the API functions to Python and adding tests).

ampli avatar May 26 '21 16:05 ampli

Yes, the internal-only definitions should be moved to dict-common.h! That would be perfect,

linas avatar May 26 '21 16:05 linas

In order to reduce the size of the said PR, I will first send a PR for moving dict-defines.h to the public API.

ampli avatar May 26 '21 17:05 ampli

There is another definition that is part of the user API but is not exposed: WILDCARD_WORD. I would like to move it from tok-structures.h to dict-defines.h, even though it is only dictionary-related (and not a dictionary definition proper).

ampli avatar May 26 '21 17:05 ampli

It may also be a good idea to expose MAX_SENTENCE. However, it is not dict-related. However, it is related to the max. effective value for word-count arguments of API calls that appear in link-includes.h (parse_options_set_short_length(), parse_options_set_*_null_count((), functions with WordIdx parameter). So can I just add it to link-includes.h?

ampli avatar May 26 '21 17:05 ampli

Yes, MAX_SENTENCE belongs in link-includes.h

linas avatar May 27 '21 16:05 linas