BlackLab
BlackLab copied to clipboard
Linguistic search for large annotated text corpora, based on Apache Lucene
Bumps [browserify-sign](https://github.com/crypto-browserify/browserify-sign) from 4.2.1 to 4.2.2. Changelog Sourced from browserify-sign's changelog. v4.2.2 - 2023-10-25 Fixed [Tests] log when openssl doesn't support cipher [#37](https://github.com/crypto-browserify/browserify-sign/issues/37) Commits Only apps should have lockfiles 09a8995...
Bumps [@babel/traverse](https://github.com/babel/babel/tree/HEAD/packages/babel-traverse) from 7.21.2 to 7.23.2. Release notes Sourced from @babel/traverse's releases. v7.23.2 (2023-10-11) NOTE: This release also re-publishes @babel/core, even if it does not appear in the linked release...
In chn-intern, running the TermSerialization tool finds terms that don't correctly "round-trip" (i.e. get the id for the term, then get the term for that id again), although not too...
See e.g. https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/util/automaton/RegExp.html#COMPLEMENT :+1: > The reserved characters used in the (enabled) syntax must be escaped with backslash (\) or double-quotes ("..."). (In contrast to other regexp syntaxes, this is...
If you try to access an index that Blacklab cannot read because it was indexed with an older version, it crashes instead of returning a clean error response. (e.g. try...
The experimental Solr module doesn't yet have support for creating/deleting private user indexes and formats and adding documents. This functionality should be added eventually.
For supporting custom hit-level annotations added by individual users (stored in a separate database), as well as parallel corpora, it would be very useful to be able to give a...
BlackLab (and CQL) don't currently support ordinary "near searches", e.g. "find _dog_, _cat_ and _hamster_ within 20 words of each other". Lucene does support these kinds of searches though, even...
It seems clients cannot reliably reconstruct the punct from XML responses, as it's joined with the identation. https://github.com/INL/BlackLab/blob/72194b794e03e87c10d406b0e3e37ba8373a6aa3/server/src/main/java/nl/inl/blacklab/server/datastream/DataStreamXml.java#L232-L235
It would be interesting to see how easy (or not) it would be to implement scoring on hits, including things like term boosting, norms, etc. We've pretty much ignored this...