spaCy Token pattern validation doesn't support lowercase attributes

Token pattern validation doesn't support lowercase attributes

Open adrianeboyd opened this issue 6 years ago • 2 comments

trafficstars

How to reproduce the behaviour

The JSON token pattern schema/validator only supports uppercase attributes.

import spacy
from spacy.matcher import Matcher, PhraseMatcher
nlp = spacy.load('en_core_web_sm')
matcher = Matcher(nlp.vocab, validate=True)
matcher.add("a", None, [{'orth': 'a'}])

Output:

spacy.errors.MatchPatternError: Invalid token patterns for matcher rule 'a'

Pattern 0:
- Additional properties are not allowed ('orth' was unexpected) [0]

Aug 12 '19 11:08 adrianeboyd

Good catch. And ugh, that's annoying... 😞 JSON schemas are case-sensitive, which I guess makes sense. Recent drafts support patternProperties, i.e. using regular expressions for keys – but this would basically make your solution in #4105 impossible. The other alternative would be to duplicate the keys and/or set up each of them as a $ref, but this would make the schema much less readable. Finally, we could not use JSON schemas but... I do think it's the right approach and I can't really think of a better system to use.

Aug 12 '19 15:08 ines

Hmm, the only other option I can think of is to normalize the patterns before checking them. I think it should be possible? You'd have to watch out for custom attributes, but otherwise the strings are always keys and the vocabulary is limited. I think, anyway?

Aug 12 '19 18:08 adrianeboyd

Not sure when this changed, but allowing for changes in Matcher syntax this seems to just work now.

import spacy
from spacy.matcher import Matcher, PhraseMatcher
nlp = spacy.load('en_core_web_sm')
matcher = Matcher(nlp.vocab, validate=True)
matcher.add("a", [[{'orth': 'a'}]])

I guess this is resolved?

Dec 06 '22 10:12 polm

Yeah, I think all the keys get normalized now.

Dec 13 '22 08:12 adrianeboyd

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Jan 13 '23 00:01 github-actions[bot]

spaCy spaCy copied to clipboard

Token pattern validation doesn't support lowercase attributes

How to reproduce the behaviour

spaCy
spaCy copied to clipboard