BlackLab icon indicating copy to clipboard operation
BlackLab copied to clipboard

Linguistic search for large annotated text corpora, based on Apache Lucene

Results 103 BlackLab issues
Sort by recently updated
recently updated
newest added

By default, Lucene will merge segment files into a single compound file to use fewer file descriptors. This takes a bit of time (7-33% during indexing according to [docs](https://cwiki.apache.org/confluence/display/lucene/ImproveIndexingSpeed)) and...

performance

If all unique terms combined total more than 2GB of character data, TermsReader will break. See [TermReader](https://github.com/INL/BlackLab/blob/evolve-hits-interfaces/engine/src/main/java/nl/inl/blacklab/forwardindex/TermsReader.java#L218): ```java // FIXME this code breaks when char term data total more than...

performance

Right now a value is sent, but it is never updated if we add or remove documents from an existing corpus. If we properly keep track of this, we can...

enhancement

Some hits occur twice, due to indexing both lemma and word in one annotation. The setting `allowDuplicateValues: false` should have prevented this. https://corpusoudnederlands.ivdnt.org/corpus-frontend/ONL/search/hits?first=0&number=20&patt=%5Bword_or_lemma%3D%22Ne%22%5D%5Bword_or_lemma%3D%22willen%22%5D&interface=%7B%22form%22%3A%22search%22%2C%22patternMode%22%3A%22simple%22%7D

bug

Right now, multithreading works per-file. This means that a single large file that contains many documents cannot use more than one CPU core. We could try reading a few input...

enhancement
indexing
performance

Certain (rare) inputs can cause this error. We're trying to set a payload for a non-existent position. How can this happen? Could this be an empty document containing only a...

bug
indexing

There are still files with mixed tabs and spaces around. This tends to cause commit noise, inconsistent indenting in diffs, etc. We should consider switching the import style to using...

refactor

See comment in colloc.js (on branch better-integration-tests)

bug

Publish static site: https://medium.com/@danieljimgarcia/publishing-static-sites-to-github-pages-using-github-actions-8040f57dfeaf Trigger GitHub action when a folder changes: https://www.techielass.com/trigger-a-github-action-workflow-when-a-folder-changes/

documentation

We don't recommend using this anymore as it's usually better to leave caching up to the OS. We can still refer people to the official GitHub page (https://github.com/hoytech/vmtouch) if they...