BlackLab
BlackLab copied to clipboard
Linguistic search for large annotated text corpora, based on Apache Lucene
By default, Lucene will merge segment files into a single compound file to use fewer file descriptors. This takes a bit of time (7-33% during indexing according to [docs](https://cwiki.apache.org/confluence/display/lucene/ImproveIndexingSpeed)) and...
If all unique terms combined total more than 2GB of character data, TermsReader will break. See [TermReader](https://github.com/INL/BlackLab/blob/evolve-hits-interfaces/engine/src/main/java/nl/inl/blacklab/forwardindex/TermsReader.java#L218): ```java // FIXME this code breaks when char term data total more than...
Right now a value is sent, but it is never updated if we add or remove documents from an existing corpus. If we properly keep track of this, we can...
Some hits occur twice, due to indexing both lemma and word in one annotation. The setting `allowDuplicateValues: false` should have prevented this. https://corpusoudnederlands.ivdnt.org/corpus-frontend/ONL/search/hits?first=0&number=20&patt=%5Bword_or_lemma%3D%22Ne%22%5D%5Bword_or_lemma%3D%22willen%22%5D&interface=%7B%22form%22%3A%22search%22%2C%22patternMode%22%3A%22simple%22%7D
Right now, multithreading works per-file. This means that a single large file that contains many documents cannot use more than one CPU core. We could try reading a few input...
Certain (rare) inputs can cause this error. We're trying to set a payload for a non-existent position. How can this happen? Could this be an empty document containing only a...
There are still files with mixed tabs and spaces around. This tends to cause commit noise, inconsistent indenting in diffs, etc. We should consider switching the import style to using...
See comment in colloc.js (on branch better-integration-tests)
Publish static site: https://medium.com/@danieljimgarcia/publishing-static-sites-to-github-pages-using-github-actions-8040f57dfeaf Trigger GitHub action when a folder changes: https://www.techielass.com/trigger-a-github-action-workflow-when-a-folder-changes/
We don't recommend using this anymore as it's usually better to leave caching up to the OS. We can still refer people to the official GitHub page (https://github.com/hoytech/vmtouch) if they...