replaCy
replaCy copied to clipboard
Automatically infer syntactic match refinements
When positive/negative matches are included, we should be able to infer POS information for the spaCy match.
This is cool by itself, but there is at least one other thing we can do: if the match is a verb, if we notice that in the positive match, the verb is transitive (resp. intransitive) and the negative match is intransitive (resp. transitive), we could automatically only match when the verb is used transitively (resp. intransitively)
Hmm... I kind of like having the test
matches be true, e.g. test.positive
sentences are sentences which CURRENTLY are matched against, and test.negative
sentences are ones that don't trigger a match.
Maybe this would look like a separate file, replacy_from_examples.json
, which would look like:
{
"require": {
"positive": [
["I require more food", "I need more food"],
["Proof of ID will be required.", "Proof of ID will be needed."]
],
"negative": [
"But I satisfy all the requirements in the job posting!"
],
"features": ["pos_", "tag_", "dep_"],
"allowed_hooks": ["hook names", "this might be crazy though, for v1 we probably can't support hooks, possibly ever"]
}
}
Then we would run python -m replacy.builder replacy_from_examples.json
and it would auto-generate a match_dict.json
?
Could use https://github.com/cyclecycle/spacy-pattern-builder for some of this