BlackLab icon indicating copy to clipboard operation
BlackLab copied to clipboard

Linguistic search for large annotated text corpora, based on Apache Lucene

Results 103 BlackLab issues
Sort by recently updated
recently updated
newest added

TO DO: - set up SolrCloud cluster - index documents on cluster - search cluster

solr

To allow users to still search older private corpora that are no longer supported by current BlackLab versions (because new Lucene versions drop support for older indexes), we could have...

proxy

This field only exists to pass the http status code between Solr and the proxy and shouldn't be in the proxy's response to its client. Instead it should set the...

proxy

This Solr URL works: http://localhost:8983/solr/test/select?bl.op=hits&bl.patt=%22the%22&indent=true&q.op=OR&q=*%3A* But the proxy doesn't understand the different response structure it receives when usecontent=orig ("Expected START_OBJECT, found VALUE_STRING").

proxy

When using BLS (in "AutoSearch mode") to add data to an index, the index status is being polled while the indexing takes place. This works fine in the beginning, but...

indexing

Intellij (rightfully) warns about non-atomic operation on volatile variable https://github.com/INL/BlackLab/blob/512719637ab986533ac9c9aaf7a575e7d3e0d586/engine/src/main/java/nl/inl/blacklab/search/results/HitGroupsTokenFrequencies.java#L299 Fix this by replacing the int and long with AtomicInteger and AtomicLong It won't crash, but group sizes may too...

bug

If a user tries to group the any token query `[]` by a single annotation, right now this is resolved using `HitGroupsTokenFrequencies`. This is faster than finding hits first, then...

performance

Refactor how annotatedfield/annotations are registered with the index metadata. Right now, this seems to be done twice: once by calling `AnnotatedFields.addFromConfig(ConfigAnnotatedField)` and again by calling `IndexMetadata.registerAnnotatedField(AnnotatedFieldWriter)`. See `DocIndexerExample.createAnnotatedFieldWriter()`. Check if...

refactor

"Tags" inside a CDATA are seen as actual (unbalanced) XML open tags, and closing tags are added at the end of the document. Example: https://portal.clarin.ivdnt.org/blacklab-server-new/opensonar/docs/WR-P-E-C-0000000129/contents?query=%5Bword%3D%22schip%22%5D&wordstart=7000

bug

Certain queries such as `[lemma="cat"] [lemma!="dog"]{10}` can produce a bunch of overlapping hits (cat followed by 1 non-dog; cat followed by 2 non-dogs; etc.). For certain queries, you want all...

enhancement