NONE error when trying to match int or bool token attributes
Trying to match int and bool with cql creates a NONE error. This seems to happen somewhere in the fsa file. It's an issue since it stops us from matching all our Token attributes
@drupchen can you give an mcve?
Sorry for taking so long. Here it is at last:
The problem would arise if we were to use the CQL matcher against Token attributes containing ints or booleans, such as Token.len or Token.freq
I think the problem comes from the fact pybo/third_party/cql.py and pybo/third_party/fsa.py only expect strings as input, but our Token attributes can be other things, like ints, lists, booleans and None.
test = [{'word': 'This',
'lemma': 'this',
'skrt': False},
{'word': 'is',
'lemma': 'be',
'freq': 0}]
q = '[skrt="False"] [freq!="1"]'
matcher = CQLMatcher(q)
matched = matcher.match(test)
note: cql.py considers that a token is structured like a dict. (that is why __getitem__() is implemented in token.py)
gives the following error:
Traceback (most recent call last):
File "/home/drupchen/PycharmProjects/Pybo/usage.py", line 12, in <module>
matched = matcher.match(test)
File "/home/drupchen/PycharmProjects/Pybo/pybo/cqlmatcher.py", line 28, in match
if i + slice_len <= len(tokens_list) and self.query(tokens_list[i:i + slice_len + 1]):
File "/home/drupchen/PycharmProjects/Pybo/pybo/third_party/cql.py", line 263, in __call__
return list(nfa.find(tokens,debug))
File "/home/drupchen/PycharmProjects/Pybo/pybo/third_party/fsa.py", line 98, in find
for length in self.run(sequence[i:], False, debug):
File "/home/drupchen/PycharmProjects/Pybo/pybo/third_party/fsa.py", line 70, in run
if matchfunction(value):
File "/home/drupchen/PycharmProjects/Pybo/pybo/third_party/cql.py", line 211, in match
match = (expr.match(value[annottype]) is not None)
TypeError: expected string or bytes-like object
It's an issue since it stops us from matching all our Token attributes
@ngawangtrinley can you highlight why that is an issue i.e. why must we be able to do it?
@mikkokotila I completely forgot the use case! I'll try to find some time tomorrow to have a closer look