lucene icon indicating copy to clipboard operation
lucene copied to clipboard

FVH BaseFragmentsBuilder does not properly support colored pre/post tags

Open MateuxLucax opened this issue 4 months ago • 0 comments

Description

Given the BaseFragmentsBuilder description:

...
/**
 * Base FragmentsBuilder implementation that supports colored pre/post tags and multivalued fields.
 *
 * <p>Uses {@link BoundaryScanner} to determine fragments.
 */
public abstract class BaseFragmentsBuilder implements FragmentsBuilder {
...

We assume that if we input a query and an array of pre and post tags, they will follow the same order, like:

Query Pre tag Post tag
A B <ab> </ab>
C B <cb> </cb>
C A <ca> </ca>

It will not tag in a ordered way as the current BaseFragmentsBuilder implementation gets tags in a almost random order:

protected String getPreTag(String[] preTags, int num) {
  int n = num % preTags.length;
  return preTags[n];
}

This is links back to this issue.

I already done some initial work to solve a problem where I work, but I would like to have a proper solution for Lucene.

The root cause is in the FieldQuery flatten, saveTerms and expand methods. They do need to exist but they also mess the order of pre/post tags. The termOrPhraseNumber is used to get the preTag, and should follow the order of the queries.

I will try to add a unit test that properly illustrates this problem as it is kinda complex.

Version and environment details

Lucene 3.0+

Any environment

MateuxLucax avatar Oct 19 '24 17:10 MateuxLucax