BlackLab
BlackLab copied to clipboard
Find some words near each other
BlackLab (and CQL) don't currently support ordinary "near searches", e.g. "find dog, cat and hamster within 20 words of each other".
Lucene does support these kinds of searches though, even in span form, so this shouldn't be too difficult to add. We'd probably add a function to CQL, something like:
near(list_of_queries, slop, in_order)
so you could for example query like this:
near(list("dog", [lemma="cat"], [pos="ADJ"][word="hamster"]), 20, false)
BTW I had a look at CWB and Sketch Engine to see what the most compatible syntax would be.
Sketch Engine has a meet
function that does something similar, e.g.:
(meet [tag = "N.*"] [tag = "VB.*"] -3 3)
This Lisp-like syntax makes it difficult to pass more complex queries (because whitespace is already used as the "sequence operator" in CQL, so we can't tell where one query ends and the next starts without extra parentheses). Also, it's probably less familiar to our users, who are more likely to know e.g. Python than Lisp.
CWB has several function-like syntaxes, e.g. /codist[...]
for macros, A = intersection B C
for set operations, dist(...)
for constraints, MU(meet|union ...)
for meet/union. Having all of these different syntaxes does not seem like a good idea to me.
I think simple imperative-style function calls as shown in my previous comment are the most pragmatic choice. This will make these kinds of features consistent and easy to use, at the cost of slightly worse CQL compatibility with other corpus engines. But as CQL is already a collection of dialects as opposed to a standard, I feel this is okay. We should document how BlackLab CQL differs from the most popular alternatives at some point.
(More or less) "pluggable" extension functions have been implemented in the feature/relations branch, so this should probably be done there as well. We need to add support for list()
to pass a list of value as a parameter (this should probably be a special operator for now), but other than that it's straightforward.