BlackLab
BlackLab copied to clipboard
Linguistic search for large annotated text corpora, based on Apache Lucene
Queries where: 1: The capture group(s) can match a variable number of tokens 2: The non-capture-group part of the query also contains terms that can also match a variable number...
Example: https://brievenalsbuit.ivdnt.org/blacklab-server/BaB/fields/titleLevel2?outputformat=json There's probably a few more requests that don't validate their input well enough. It would be nice to fix these so they produce a normal error response.
There's several places in the code where it is assumed that threads will see each other's mutations directly and in the order they were made. However, without synchronization, this is...
HitsFromQueryParallel: if an exception (e.g. NPE) occurs in one of the SpansReaders, it disappears.
It would be better if the exception is re-thrown in the main search thread so we are alerted to the problem. Instead, the hits that this SpansReader would have retrieved...
It would be nice if, for example, certain XML tags like `` would be included in the hits concordances. Right now this *can* be achieved, by passing `usecontent=orig`, but this...
Two-phase iterators are a mechanism in Lucene that can speed up SpanQueries by immediately skipping over documents that cannot possibly contain any matches, before the term vectors are fetched. It...
I'm trying to filter documents by a numeric metadata field (called **order** in this instance). I'm using the configuration file to specify that the field is numeric: ``` # Embedded...
Use cases: user has some complex pattern to extract, for instance, verb-subject-object triples. It would be nice to be able to group directly on properties of one or more of...
When a test crashes, the test index is not cleaned up. We used to clean up any old test indexes we found, but this caused issues when running tests in...
For example, `ErrorOpeningIndex` shouldn't be a checked exception, as in the vast majority of cases there's no graceful way to handle this. This leads to either it being immediately wrapped...