BlackLab
BlackLab copied to clipboard
Linguistic search for large annotated text corpora, based on Apache Lucene
TO DO: - set up SolrCloud cluster - index documents on cluster - search cluster
To allow users to still search older private corpora that are no longer supported by current BlackLab versions (because new Lucene versions drop support for older indexes), we could have...
This field only exists to pass the http status code between Solr and the proxy and shouldn't be in the proxy's response to its client. Instead it should set the...
This Solr URL works: http://localhost:8983/solr/test/select?bl.op=hits&bl.patt=%22the%22&indent=true&q.op=OR&q=*%3A* But the proxy doesn't understand the different response structure it receives when usecontent=orig ("Expected START_OBJECT, found VALUE_STRING").
When using BLS (in "AutoSearch mode") to add data to an index, the index status is being polled while the indexing takes place. This works fine in the beginning, but...
Intellij (rightfully) warns about non-atomic operation on volatile variable https://github.com/INL/BlackLab/blob/512719637ab986533ac9c9aaf7a575e7d3e0d586/engine/src/main/java/nl/inl/blacklab/search/results/HitGroupsTokenFrequencies.java#L299 Fix this by replacing the int and long with AtomicInteger and AtomicLong It won't crash, but group sizes may too...
If a user tries to group the any token query `[]` by a single annotation, right now this is resolved using `HitGroupsTokenFrequencies`. This is faster than finding hits first, then...
Refactor how annotatedfield/annotations are registered with the index metadata. Right now, this seems to be done twice: once by calling `AnnotatedFields.addFromConfig(ConfigAnnotatedField)` and again by calling `IndexMetadata.registerAnnotatedField(AnnotatedFieldWriter)`. See `DocIndexerExample.createAnnotatedFieldWriter()`. Check if...
"Tags" inside a CDATA are seen as actual (unbalanced) XML open tags, and closing tags are added at the end of the document. Example: https://portal.clarin.ivdnt.org/blacklab-server-new/opensonar/docs/WR-P-E-C-0000000129/contents?query=%5Bword%3D%22schip%22%5D&wordstart=7000
Certain queries such as `[lemma="cat"] [lemma!="dog"]{10}` can produce a bunch of overlapping hits (cat followed by 1 non-dog; cat followed by 2 non-dogs; etc.). For certain queries, you want all...