Jan Niestadt

Results 102 issues of Jan Niestadt

E.g. if we group by `hit:word:i` (matched word(s), insensitive), and both `cat` and `Cat` appear in the corpus, the identity values `cws:word:i:cat` and `cws:word:i:Cat` denote the same group (insensitive context...

bug

Right now, in some (hopefully rare) scenarios, searches could be in memory twice, wasting memory and CPU. This is because the logic that decides to remove a search (`SearchCacheEntry`) from...

performance

When metadata is retrieved from a document that contains the metadata of many documents, e.g. a CSV file where every row contains metadata for a document, right now there is...

indexing

(reported by Vincent Vandeghinste) indexing errors aren't always reported clearly. Try indexing this single XML file (zipped because GH doesn't support .xml attachments) as folia: [karrewiet-folia-fout.zip](https://github.com/INL/BlackLab/files/4425524/karrewiet-folia-fout.zip) In corpus-frontend, this will...

indexing

See https://github.com/INL/corpus-frontend/issues/358 We'd like to know if your connection with BlackLab is deemed to be in debug mode (configurable by IP), so we know if we can use `usecache=no` to...

webservice

Right now, we keep track of how many running "searches" we have, but some of those may just be a thread waiting for another thread to finish, while others actually...

performance

To reduce index size for sparse annotations, BlackLab recognizes when empty strings are indexed for a number of successive tokens and replaces this with a position gap (not storing anything...

indexing

If you search for `[lemma=""]`, it is possible to match the extra closing token at the end of a document (that exists to store the last bit of punctuation and...

bug

The [example](https://inl.github.io/BlackLab/how-to-configure-indexing.html#subproperties) in the documentation suggests that this isn't the case, but that was with the old way of indexing subannotations, in a single Lucene field. Now each subannotation gets...

indexing

- [ ] For elements that only have short values (e.g. lemma), a single-line input field that can be confirmed just by pressing Enter would be nice. - [X] For...

question?
beta GUI
time-short