BlackLab issues

Try to ensure we access the forward index sequentially

2

HitsFromQueryParallel may add documents in more or less random order because documents from different segments are added in parallel. This may lead to thrashing the disk cache if we're sorting/grouping...

jan-niestadt

performance

Speed up HitsFromQueryParallel by checking maxHitsToRetrieve less often

Similar to changes in HitGroupsTokenFrequencies; it doesn't matter if we retrieve more than maxHitsToRetrieve as this is only a guideline.

jan-niestadt

performance

Store full indexing configuration with every index?

1

We have a server where users can create a custom indexing configuration and update data to be indexed. This server is running an older version of BlackLab and some of...

jan-niestadt

enhancement

indexing

Save memory by memory-mapping more files

Maybe the terms file could be memory-mapped? The format would probably need to change for this. If we integrate the forward index and content store with Solr, we will probably...

jan-niestadt

performance

fixed-by-solr

Use slices instead of leaves for concurrency

1

We currently use LeafReaders for concurrently gathering hits. They are divided over the available search threads more or less randomly. The problem is that each LeafReader searches a segment, which...

jan-niestadt

performance

Allow grouping without storing all hits

1

When grouping, the user doesn't always need all hits. In they don't, it's more efficient to just count the group sizes and not store each hit. This does mean that...

jan-niestadt

performance

Support for CoNLL-U format

1

(Requested by @JessedeDoes) Expand TSV input type to be able to deal with the [CoNLL-U format](http://universaldependencies.org/format.html). The format is basically a TSV with some special features (point 2 and 3):...

jan-niestadt

enhancement

good first issue

JVM crash if no disk space left while indexing

If there's no disk space left while indexing, the JVM will crash (segmentation fault in java.nio.Bits.copyFromIntArray). Indexing will probably fail if there's less than 2 GB of disk space at...

jan-niestadt

enhancement

indexing

/search-test/ URLs should include search

e.g. https://corpora.ato.ivdnt.org/blacklab-server/search-test/index.html?corpus-url=...&patt=... Useful for mailing examples of issues, etc.

jan-niestadt

enhancement

webservice

Paging issues: more context required for document rendition

1

When blacklab fetches a page from a document, some information may be missing which is required for a good rendition of the document context. Example 1: namespaces. (cf related issue...

JessedeDoes

bug

BlackLab
BlackLab copied to clipboard

Metadata

Try to ensure we access the forward index sequentially

Speed up HitsFromQueryParallel by checking maxHitsToRetrieve less often

Store full indexing configuration with every index?

Save memory by memory-mapping more files

Use slices instead of leaves for concurrency

Allow grouping without storing all hits

Support for CoNLL-U format

JVM crash if no disk space left while indexing

/search-test/ URLs should include search

Paging issues: more context required for document rendition

← Metadata

Owner

Metadata

BlackLab BlackLab copied to clipboard

Metadata

← Metadata

Owner

Metadata

BlackLab
BlackLab copied to clipboard