BlackLab
BlackLab copied to clipboard
allowDuplicateValues doesn't always work correctly?
Some hits occur twice, due to indexing both lemma and word in one annotation. The setting allowDuplicateValues: false
should have prevented this.
https://corpusoudnederlands.ivdnt.org/corpus-frontend/ONL/search/hits?first=0&number=20&patt=%5Bword_or_lemma%3D%22Ne%22%5D%5Bword_or_lemma%3D%22willen%22%5D&interface=%7B%22form%22%3A%22search%22%2C%22patternMode%22%3A%22simple%22%7D
Config: https://github.com/INL/corpus-frontend-config/blob/master/ONL/ONL.blf.yaml
The default for allowDuplicateValues
is now false.
We should probably double-check whether this issue is reproducible.
This was caused by a (malformed?) document where the containerPath would match document contents twice:
# What element (relative to document) contains this field's contents?
# (if omitted, entire document is used)
containerPath: .//text
<text>
<group>
<text>
<body>
....
</body>
</text>
</group>
</text>