parser
parser copied to clipboard
remove concept of 'word'
This library has the concepts of word
, phrase
and section
I not sure if the word
concept is required as it can be represented as a single token phrase
.
In fact, I think there is duplication between words
and single token phrases
right now.
If possible, it would be nice to remove the concept of a word
, which should help clean up the code.
@Joxit :+1: or :-1: on this? I might have a look at doing it at some point when I have some time, but it will be a fairly noisy commit.
I had a think about how this might work, before attempting this I think we should first focus on the graph
:
- document existing graph relationships in readme
- possibly refactor or rename graph relationships for clarity (and to have different verbs for
phrase
andword
relationships)
once this is done it should be possible to delete all single-word phrase
objects and simply replace them with a pointer to the word
span.
The main benefit of doing this refactor would be to clean up all the classifier
and solver
logic, which can get quite verbose and complex.
So if we have an idea of what we'd like the graph calls to look like to improve this then we can go ahead and start introducing new graph relationships to support them.
Since relationships in the graph are cheap, we can safely build up a range of links, and also its fairly easy to monitor the use of graph relationships we'd like to deprecate (link child
?) and work to gradually replace them with other relationships until the codebase is 100% migrated.
At some point we can remove the WordClassifier
completely so it's no longer possible to classify word
spans directly.