spaCy
spaCy copied to clipboard
Token pattern validation doesn't support lowercase attributes
How to reproduce the behaviour
The JSON token pattern schema/validator only supports uppercase attributes.
import spacy
from spacy.matcher import Matcher, PhraseMatcher
nlp = spacy.load('en_core_web_sm')
matcher = Matcher(nlp.vocab, validate=True)
matcher.add("a", None, [{'orth': 'a'}])
Output:
spacy.errors.MatchPatternError: Invalid token patterns for matcher rule 'a'
Pattern 0:
- Additional properties are not allowed ('orth' was unexpected) [0]
Good catch. And ugh, that's annoying... 😞 JSON schemas are case-sensitive, which I guess makes sense. Recent drafts support patternProperties, i.e. using regular expressions for keys – but this would basically make your solution in #4105 impossible. The other alternative would be to duplicate the keys and/or set up each of them as a $ref, but this would make the schema much less readable. Finally, we could not use JSON schemas but... I do think it's the right approach and I can't really think of a better system to use.
Hmm, the only other option I can think of is to normalize the patterns before checking them. I think it should be possible? You'd have to watch out for custom attributes, but otherwise the strings are always keys and the vocabulary is limited. I think, anyway?
Not sure when this changed, but allowing for changes in Matcher syntax this seems to just work now.
import spacy
from spacy.matcher import Matcher, PhraseMatcher
nlp = spacy.load('en_core_web_sm')
matcher = Matcher(nlp.vocab, validate=True)
matcher.add("a", [[{'orth': 'a'}]])
I guess this is resolved?
Yeah, I think all the keys get normalized now.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.