Jan Niestadt

Results 102 issues of Jan Niestadt

The method used to manually advance systemList.head and call system.removeFromEngine(), but removeSystem() already takes care of that. This eventually resulted in a null reference. The offending lines were removed, and...

Useful while debugging to detect if an entity that should have been removed is still in the engine, or if an active component is somehow still in the ComponentPool (maybe...

By default, Lucene will merge segment files into a single compound file to use fewer file descriptors. This takes a bit of time (7-33% during indexing according to [docs](https://cwiki.apache.org/confluence/display/lucene/ImproveIndexingSpeed)) and...

performance

If all unique terms combined total more than 2GB of character data, TermsReader will break. See [TermReader](https://github.com/INL/BlackLab/blob/evolve-hits-interfaces/engine/src/main/java/nl/inl/blacklab/forwardindex/TermsReader.java#L218): ```java // FIXME this code breaks when char term data total more than...

performance

Right now a value is sent, but it is never updated if we add or remove documents from an existing corpus. If we properly keep track of this, we can...

enhancement

Some hits occur twice, due to indexing both lemma and word in one annotation. The setting `allowDuplicateValues: false` should have prevented this. https://corpusoudnederlands.ivdnt.org/corpus-frontend/ONL/search/hits?first=0&number=20&patt=%5Bword_or_lemma%3D%22Ne%22%5D%5Bword_or_lemma%3D%22willen%22%5D&interface=%7B%22form%22%3A%22search%22%2C%22patternMode%22%3A%22simple%22%7D

bug

Right now, multithreading works per-file. This means that a single large file that contains many documents cannot use more than one CPU core. We could try reading a few input...

enhancement
indexing
performance

Certain (rare) inputs can cause this error. We're trying to set a payload for a non-existent position. How can this happen? Could this be an empty document containing only a...

bug
indexing

There are still files with mixed tabs and spaces around. This tends to cause commit noise, inconsistent indenting in diffs, etc. We should consider switching the import style to using...

refactor

See comment in colloc.js (on branch better-integration-tests)

bug