Multivalued annotation displaying/sorting/grouping/statistics
Not sure this has not been submitted before, but e.g. only one of multiple lemmata is displayed
(http://svotmc10.ivdnt.loc/corpus-frontend/Gysseling/search/hits?first=0&number=20&patt=%5Bword%3D%22tantwordene%22%26lemma%3D%22antwoorden%22%5D&interface=%7B%22form%22%3A%22search%22%2C%22patternMode%22%3A%22extended%22%7D)
This could be tricky to do as the concordances are created from the forward index, which at present only stores the first value indexed at every position.
One option is to create the concordances from the content store (this used to be how we did it, and should still be possible with the parameter usecontent=orig. This basically returns part of the original XML, so all the values should be in there. But getting this to work with corpus-frontend might be a challenge as the concordances would have a project-specific XML structure.
There are sevaral related problems:
- While searching on multivalues annotations works fine, we cannot display more than one (due to the aforementioned forward-index issue)
- Because the forward index only stores the first value, grouping and sorting will not work on the second and beyond values on one token.
- Additionally: we must decide whether we want to include a token with multiple values in multiple groups, or not. There is also ambiguity in counting such hits. When a token has
lemma=aandlemma=b, does the querylemma=a|bproduce one hit or two?
The common workaround we landed on is to index the value twice, in two different annotations Once tokenized Once concatenated
Then:
- give the annotations both the same name
- configure the frontend using custom js
- display the concatenated version in the results (using
concordanceAnnotationId) - use the tokenized one for searching (setting
searchAnnotationId) - use the concatenated one for grouping and sorting operations, hide the other one from those options
- display the concatenated version in the results (using