Jan Niestadt issues

Results 102 issues of


                                            Jan Niestadt

Improve documentation

# BlackLab Server requests - [x] For example, it is not very clear that the `patt` parameter is in Corpus Query Language (unless you override that with `pattlang`, which in...

good first issue

webservice

Option to return whole sentence as context

Right now, each match is returned with a fixed number of words as context (wordsaroundhit parameter). We would like to have an option to return matches with the whole sentence...

enhancement

webservice

library

Change from Java EE to Jakarta EE to allow BLS to run in Tomcat 10

see https://inl.github.io/BlackLab/blacklab-server-overview.html#installation

webservice

Switch default XML parser to Saxon?

BlackLab uses the XML library VTD-XML by default for processing documents while indexing. This only supports XPath 1.0. @eduarddrenth made it possible to use Saxon, a more feature-rich (supports XPath...

indexing

Aggregate javadocs from multiple modules

The Javadocs online are now only for the blackLab-engine module. Ideally the Javadocs for all the modules would be aggregated. See https://maven.apache.org/plugins/maven-javadoc-plugin/examples/aggregate.html

good first issue

documentation

Try to ensure we access the forward index sequentially

HitsFromQueryParallel may add documents in more or less random order because documents from different segments are added in parallel. This may lead to thrashing the disk cache if we're sorting/grouping...

performance

Speed up HitsFromQueryParallel by checking maxHitsToRetrieve less often

Similar to changes in HitGroupsTokenFrequencies; it doesn't matter if we retrieve more than maxHitsToRetrieve as this is only a guideline.

performance

Store full indexing configuration with every index?

We have a server where users can create a custom indexing configuration and update data to be indexed. This server is running an older version of BlackLab and some of...

enhancement

indexing

Save memory by memory-mapping more files

Maybe the terms file could be memory-mapped? The format would probably need to change for this. If we integrate the forward index and content store with Solr, we will probably...

performance

fixed-by-solr

Use slices instead of leaves for concurrency

We currently use LeafReaders for concurrently gathering hits. They are divided over the available search threads more or less randomly. The problem is that each LeafReader searches a segment, which...

performance