BlackLab icon indicating copy to clipboard operation
BlackLab copied to clipboard

Linguistic search for large annotated text corpora, based on Apache Lucene

Results 103 BlackLab issues
Sort by recently updated
recently updated
newest added

HitsFromQueryParallel may add documents in more or less random order because documents from different segments are added in parallel. This may lead to thrashing the disk cache if we're sorting/grouping...

performance

Similar to changes in HitGroupsTokenFrequencies; it doesn't matter if we retrieve more than maxHitsToRetrieve as this is only a guideline.

performance

We have a server where users can create a custom indexing configuration and update data to be indexed. This server is running an older version of BlackLab and some of...

enhancement
indexing

Maybe the terms file could be memory-mapped? The format would probably need to change for this. If we integrate the forward index and content store with Solr, we will probably...

performance
fixed-by-solr

We currently use LeafReaders for concurrently gathering hits. They are divided over the available search threads more or less randomly. The problem is that each LeafReader searches a segment, which...

performance

When grouping, the user doesn't always need all hits. In they don't, it's more efficient to just count the group sizes and not store each hit. This does mean that...

performance

(Requested by @JessedeDoes) Expand TSV input type to be able to deal with the [CoNLL-U format](http://universaldependencies.org/format.html). The format is basically a TSV with some special features (point 2 and 3):...

enhancement
good first issue

If there's no disk space left while indexing, the JVM will crash (segmentation fault in java.nio.Bits.copyFromIntArray). Indexing will probably fail if there's less than 2 GB of disk space at...

enhancement
indexing

e.g. https://corpora.ato.ivdnt.org/blacklab-server/search-test/index.html?corpus-url=...&patt=... Useful for mailing examples of issues, etc.

enhancement
webservice

When blacklab fetches a page from a document, some information may be missing which is required for a good rendition of the document context. Example 1: namespaces. (cf related issue...

bug