BlackLab icon indicating copy to clipboard operation
BlackLab copied to clipboard

allowDuplicateValues doesn't always work correctly?

Open jan-niestadt opened this issue 2 years ago • 2 comments

Some hits occur twice, due to indexing both lemma and word in one annotation. The setting allowDuplicateValues: false should have prevented this.

https://corpusoudnederlands.ivdnt.org/corpus-frontend/ONL/search/hits?first=0&number=20&patt=%5Bword_or_lemma%3D%22Ne%22%5D%5Bword_or_lemma%3D%22willen%22%5D&interface=%7B%22form%22%3A%22search%22%2C%22patternMode%22%3A%22simple%22%7D

jan-niestadt avatar Aug 19 '22 10:08 jan-niestadt

Config: https://github.com/INL/corpus-frontend-config/blob/master/ONL/ONL.blf.yaml

jan-niestadt avatar Aug 19 '22 10:08 jan-niestadt

The default for allowDuplicateValues is now false.

We should probably double-check whether this issue is reproducible.

jan-niestadt avatar Sep 22 '22 12:09 jan-niestadt

This was caused by a (malformed?) document where the containerPath would match document contents twice:

# What element (relative to document) contains this field's contents?
# (if omitted, entire document is used)
containerPath: .//text
<text> 
    <group>
        <text> 
            <body>
               ....
            </body>
        </text>
    </group>
</text>

KCMertens avatar May 01 '23 10:05 KCMertens