link-grammar
link-grammar copied to clipboard
Dictionary definitions for user API
dictionary_lookup_list() and dictionary_lookup_wild() are part of the API, and words that they return, if subscripted, have SUBSCRIPT_MARK in them. However, this symbol is currently not part of the API.
Similarly, returned parses may include LEFT-WORD and RIGHT-WORD, which may need to be handled by the user.
These symbols are defined in dict-defines.h, which is not part of the user API.
I encountered this problem when I added the rest of the API functions (those that are in dict-api.h and dict-structures.h) to the Python bindings. When I wanted to test dictionary_lookup_list() I found that SUBSCRIPT_MARK needs to be hard-coded.
I already had such a problem in link-generator:
https://github.com/opencog/link-grammar/blob/5cb9e68e553b6ba39f17b021cdd13b6c6105a414/link-parser/generator-utilities.c#L13-L15
Possible solutions:
- Continue to hard-code these constants.
- Provide a public
.hfile with definitions for constants that are used by the API. The filedict-api.his not appropriate unless we would eventually like to use another include file for non-dictionary constants. It also contains irrelevant (to the API) definitions. - Put the constants in the public API file that contains functions that use them, with a priority to
link-includes.h. - Implement a new API call
linkgrammar_get_library_definition(const char *), when the argument may be strings likeleft-wall, 'max-word`, etc. But I think this is less desirable than (3).
I propose to implement (3). It is already implemented in link-includes.h for the lg_error_severity and ConstituentDisplayStyle enums, and also for LG_PANIC_DISJUNCT_COST. Similarly, LEFT_WALL_WORD and RIGHT_WALL_WORD can be moved to it. The SUBSCRIPT_* definitions can be moved to dict-api.h. It is not clear to me what to do with constants like MAX_WORD.
However, if (3) is implemented, there will be a need to include link-includes.h in files that need e.g. `LEFT_WORD
why not just make dict-common/dict-defines.h part of the public API? That is, just install it, and let users use it.
why not just make
dict-common/dict-defines.hpart of the public API? That is, just install it, and let users use it.
It currently also contains internal-only definitions:
static const double UNINITIALIZED_MAX_DISJUNCT_COST = -10000.0;
static const double DEFAULT_MAX_DISJUNCT_COST = 2.7;
/* We need some of these as literal strings. */
#define LG_DISJUNCT_COST "max-disjunct-cost"
#define LG_DICTIONARY_VERSION_NUMBER "dictionary-version-number"
#define LG_DICTIONARY_LOCALE "dictionary-locale"
However, I guess an added comment can just indicate that (or maybe they can be moved to dict-common/dict-common.h).
In any case, I will include its move to the public API in the PR I am preparing (started as adding lg_exp_resolve(), but got extended, due to the need to test this function, to adding all the rest of the API functions to Python and adding tests).
Yes, the internal-only definitions should be moved to dict-common.h! That would be perfect,
In order to reduce the size of the said PR, I will first send a PR for moving dict-defines.h to the public API.
There is another definition that is part of the user API but is not exposed: WILDCARD_WORD.
I would like to move it from tok-structures.h to dict-defines.h, even though it is only dictionary-related (and not a dictionary definition proper).
It may also be a good idea to expose MAX_SENTENCE. However, it is not dict-related.
However, it is related to the max. effective value for word-count arguments of API calls that appear in link-includes.h
(parse_options_set_short_length(), parse_options_set_*_null_count((), functions with WordIdx parameter).
So can I just add it to link-includes.h?
Yes, MAX_SENTENCE belongs in link-includes.h