Register nested queries (ToParentBlockJoinQuery) to Lucene Monitor
Description
I use Lucene Monitor with regular Document objects and it works just fine. The thing is that I'd like to match with Solr queries that I use in a nested collection, but I've couldn't get this work. I'm openning this ticket as a request because I'm not sure if there is any official support for nested queries in lucene-monitor.
I've tries to match documents with nested queries that needs to match both fields of the parent doc and both fields from the child doc. To acheive this I've tried to use Monitor#match and give it Document[], but from what I understand Lucene Monitor doesn't enable queries with "context" of other documents.
I'd like to know if it can work in any way right now, and if not I'd like to know what's needed to be done to contribute such feature.
The code I've tried it with is here
@Override
public void run(ApplicationArguments args) throws Exception {
MonitorConfiguration monitorConfig = new MonitorConfiguration();
Monitor monitor = new Monitor(new StandardAnalyzer(), new TermFilteredPresearcher(), monitorConfig);
registerQueries(monitor);
// Creating a parent document with child documents
Document parentDoc = new Document();
parentDoc.add(new StringField("id", "g1", Field.Store.YES));
parentDoc.add(new StringField("color", "green", Field.Store.YES));
parentDoc.add(new StringField("title", "Grass", Field.Store.YES));
parentDoc.add(new StoredField("isParent", "true"));
// Creating child document
Document childDoc1 = new Document();
childDoc1.add(new TextField("childField1", "childValue1", Field.Store.YES));
childDoc1.add(new StringField("isParent", "false", Field.Store.YES));
Document childDoc2 = new Document();
childDoc2.add(new TextField("childField2", "childValue2", Field.Store.YES));
childDoc2.add(new StringField("isParent", "false", Field.Store.YES));
Document[] documents = {parentDoc,childDoc1};
MultiMatchingQueries<HighlightsMatch> hm = monitor.match(documents, HighlightsMatch.MATCHER);
log.info("Got " + hm.getMatchCount(0) + " matches");
hm.getMatches(0).forEach(m -> {
log.info("Match: " + m.getQueryId() + " with " + m.getHitCount());
m.getHits("childField1").forEach(h -> {
log.info(" hit: " + h.toString() + " - " + childDoc1.get("childField1").substring(h.startOffset, h.endOffset));
});
});
}
private void registerQueries(Monitor monitor) throws IOException, ParseException {
MonitorQuery monitorQuery = newMonitorQuery("ChildQuery1", "childField1:childValue1", Collections.singletonMap("customer", "123"));
monitor.register(monitorQuery);
monitor.register(newMonitorQuery("ChildQuery2", "childField2:childValue2", Collections.singletonMap("customer", "124")));
}
private MonitorQuery newMonitorQuery(String id, String queryString, Map<String, String> metadata) throws ParseException {
QueryParser qp = new QueryParser("childField1", new StandardAnalyzer());
Query childQuery = qp.parse(queryString);
// Construct ToParentBlockJoinQuery from child query
BitSetProducer parentFilter = new QueryBitSetProducer(new TermQuery(new Term("isParent", "true")));
ToParentBlockJoinQuery parentJoinQuery = new ToParentBlockJoinQuery(childQuery, parentFilter, ScoreMode.None);
log.info("Registered monitor query " + id);
return new MonitorQuery(id, parentJoinQuery, queryString, metadata);
}
@mikemccand maybe you've got an idea?
I tried to follow the instructions that I saw in your blog and produced code as follows:
@Override
public void run(ApplicationArguments args) throws Exception {
MonitorConfiguration monitorConfig = new MonitorConfiguration();
Monitor monitor = new Monitor(new StandardAnalyzer(), new TermFilteredPresearcher(), monitorConfig);
registerQueries(monitor);
// Creating a parent document with child documents
Document parentDoc = new Document();
parentDoc.add(new StringField("id", "shirt1", Field.Store.YES));
parentDoc.add(new StringField("name", "wolf", Field.Store.YES));
parentDoc.add(new StringField("type", "shirt", Field.Store.YES));
// Creating child documents (SKUs)
Document childDoc1 = new Document();
childDoc1.add(new StringField("size", "small", Field.Store.YES));
childDoc1.add(new StringField("color", "blue", Field.Store.YES));
Document childDoc2 = new Document();
childDoc2.add(new StringField("size", "medium", Field.Store.YES));
childDoc2.add(new StringField("color", "black", Field.Store.YES));
Document[] documents = {parentDoc,childDoc1,childDoc2};
MultiMatchingQueries<HighlightsMatch> hm = monitor.match(documents, HighlightsMatch.MATCHER);
log.info("Got " + hm.getMatchCount(0) + " matches");
hm.getMatches(0).forEach(m -> {
log.info("Match: " + m.getQueryId() + " with " + m.getHitCount());
m.getHits("name").forEach(h -> {
log.info(" hit: " + h.toString() + " - " + parentDoc.get("name").substring(h.startOffset, h.endOffset));
});
});
}
private void registerQueries(Monitor monitor) throws IOException, ParseException {
MonitorQuery monitorQuery1 = newMonitorQuery("ShirtQuery1", "name:wolf AND size:small AND color:blue", Collections.singletonMap("customer", "123"));
MonitorQuery monitorQuery2 = newMonitorQuery("ShirtQuery2", "name:wolf AND size:medium AND color:black", Collections.singletonMap("customer", "124"));
monitor.register(monitorQuery1);
monitor.register(monitorQuery2);
}
private MonitorQuery newMonitorQuery(String id, String queryString, Map<String, String> metadata) throws ParseException {
QueryParser qp = new QueryParser("name", new StandardAnalyzer());
Query shirtQuery = qp.parse(queryString.split(" AND ")[0]);
QueryParser qpChild = new QueryParser("size", new StandardAnalyzer());
Query sizeQuery = qpChild.parse(queryString.split(" AND ")[1]);
qpChild = new QueryParser("color", new StandardAnalyzer());
Query colorQuery = qpChild.parse(queryString.split(" AND ")[2]);
BooleanQuery childQuery = new BooleanQuery.Builder()
.add(sizeQuery, BooleanClause.Occur.MUST)
.add(colorQuery, BooleanClause.Occur.MUST)
.build();
// Construct ToParentBlockJoinQuery from child query
BitSetProducer parentFilter = new QueryBitSetProducer(new TermQuery(new Term("type", "shirt")));
ToParentBlockJoinQuery parentJoinQuery = new ToParentBlockJoinQuery(childQuery, parentFilter, ScoreMode.None);
BooleanQuery finalQuery = new BooleanQuery.Builder()
.add(shirtQuery, BooleanClause.Occur.MUST)
.add(parentJoinQuery, BooleanClause.Occur.MUST)
.build();
log.info("Registered monitor query " + id);
return new MonitorQuery(id, finalQuery, queryString, metadata);
}
But it still wouldn't match. I followed the blog of Lucene nested queries.
@jpountz @benwtrent @javanna I saw that Elasticsearch does have the option of percolating nested queries. I wonder if its got the simillar optimizations of Lucene Monitor, or is it just query that gets executed every X seconds. Solr doesn't have an equivalent. @epugh
Hi, @almogtavor . I briefly looked at it. My conclusion that it's not trivial. This percolator involves complex machinery. Perhaps it might be implemented via CustomQueryHandler. But I couldn't figure it out quickly. Also, you say BJQ is supported in Elastic's percolator, probably you can spot an implementation approach there.
@mkhludnev The issue is that the lucene-monitor enables one to percolate a single document, and therefore there's no easy option for using BJQ. ES managed to create a percolator service that seems to work pretty similar to lucene-monitor, but it seems that it doesn't support nested queries either (https://github.com/elastic/elasticsearch/issues/2960).
I wonder how hard would it be to implement an option in lucene-monitor that would accept a bulk of documents (the same way as we can do now), but will match process these documents as a bulk instead of one document. This way if there would be a query who'll treat two docs - a parent & a child, the parent doc would return.