BlackLab issues

usecache=no causes issues

1

It can be useful to skip the cache for debugging purposes, but this causes problems, because the code relies on e.g. `docsCount = searchParam.docsCount().executeAsync().peek();` to return a running docs count...

jan-niestadt

bug

webservice

Reimplement/replace DataStream with Jackson Stream API

1

[Jackson Streaming API](https://www.baeldung.com/jackson-streaming-api#writing-to-json) does most of what DataStream does now. Exceptions are contextList (lists of values for annotations that get a different structure in XML and JSON) and all-in-one status/error...

jan-niestadt

refactor

Include attribute values of matching element in CSV

2

If a query matches an XML element (e.g. ), it would be nice if the attribute values of the element (such as the paragraph's ID) could be included in the...

jan-niestadt

webservice

Improve multithreaded hit fetching performance by making sure there's always a buffer of hits available

1

We have `Hits.FETCH_HITS_MIN`, which is sometimes added to the requested number of hits to make sure we don't fetch a single hit every time while we're iterating through a list...

jan-niestadt

performance

Enable counting hits without storing all of them

1

If the user wants to know the total number of hits, but doesn't need all the hits, and is not sorting or grouping them, we might not need to instantiate...

jan-niestadt

performance

Compress the forward index

1

Operations that use the forward index tend to be I/O-limited, and the forward index takes up a lot of disk space. As our larger corpora grow, and we've added more...

jan-niestadt

performance

With insensitive grouping, different identity values may be chosen to represent the same group

E.g. if we group by `hit:word:i` (matched word(s), insensitive), and both `cat` and `Cat` appear in the corpus, the identity values `cws:word:i:cat` and `cws:word:i:Cat` denote the same group (insensitive context...

jan-niestadt

bug

Allow for monitoring of counts while searches are ongoing via ResultsCache

2

The current `ResultsCache` does not allow for monitoring of counts see: See https://github.com/INL/BlackLab/pull/276#issuecomment-1061083756.

eginez

robustness

cache

Refactor ResultsCache to something more relevant

1

ResultsCache is the name of an alternative BlacklabCache, the name does not reflect its purpose. Find a better a name See https://github.com/INL/BlackLab/pull/276#issuecomment-1060532223

eginez

refactor

cache

Search cache should be aware of references between Searches.

3

Right now, in some (hopefully rare) scenarios, searches could be in memory twice, wasting memory and CPU. This is because the logic that decides to remove a search (`SearchCacheEntry`) from...

jan-niestadt

performance

BlackLab
BlackLab copied to clipboard

Metadata

usecache=no causes issues

Reimplement/replace DataStream with Jackson Stream API

Include attribute values of matching element in CSV

Improve multithreaded hit fetching performance by making sure there's always a buffer of hits available

Enable counting hits without storing all of them

Compress the forward index

With insensitive grouping, different identity values may be chosen to represent the same group

Allow for monitoring of counts while searches are ongoing via ResultsCache

Refactor ResultsCache to something more relevant

Search cache should be aware of references between Searches.

← Metadata

Owner

Metadata

BlackLab BlackLab copied to clipboard

Metadata

← Metadata

Owner

Metadata

BlackLab
BlackLab copied to clipboard