BlackLab icon indicating copy to clipboard operation
BlackLab copied to clipboard

Linguistic search for large annotated text corpora, based on Apache Lucene

Results 103 BlackLab issues
Sort by recently updated
recently updated
newest added

It can be useful to skip the cache for debugging purposes, but this causes problems, because the code relies on e.g. `docsCount = searchParam.docsCount().executeAsync().peek();` to return a running docs count...

bug
webservice

[Jackson Streaming API](https://www.baeldung.com/jackson-streaming-api#writing-to-json) does most of what DataStream does now. Exceptions are contextList (lists of values for annotations that get a different structure in XML and JSON) and all-in-one status/error...

refactor

If a query matches an XML element (e.g. ), it would be nice if the attribute values of the element (such as the paragraph's ID) could be included in the...

webservice

We have `Hits.FETCH_HITS_MIN`, which is sometimes added to the requested number of hits to make sure we don't fetch a single hit every time while we're iterating through a list...

performance

If the user wants to know the total number of hits, but doesn't need all the hits, and is not sorting or grouping them, we might not need to instantiate...

performance

Operations that use the forward index tend to be I/O-limited, and the forward index takes up a lot of disk space. As our larger corpora grow, and we've added more...

performance

E.g. if we group by `hit:word:i` (matched word(s), insensitive), and both `cat` and `Cat` appear in the corpus, the identity values `cws:word:i:cat` and `cws:word:i:Cat` denote the same group (insensitive context...

bug

The current `ResultsCache` does not allow for monitoring of counts see: See https://github.com/INL/BlackLab/pull/276#issuecomment-1061083756.

robustness
cache

ResultsCache is the name of an alternative BlacklabCache, the name does not reflect its purpose. Find a better a name See https://github.com/INL/BlackLab/pull/276#issuecomment-1060532223

refactor
cache

Right now, in some (hopefully rare) scenarios, searches could be in memory twice, wasting memory and CPU. This is because the logic that decides to remove a search (`SearchCacheEntry`) from...

performance