janusgraph
janusgraph copied to clipboard
Add support for ElasticSearch 8
ElasticSearch 8 was released a while ago (February 10, 2022). It's now safe to add support for ElasticSearch 8 into JanusGraph.
In ElasticSearch 8 Prefix Tree index strategy isn't supported for geo_shape
anymore. Only BKD
index strategy is supported.
Playing with BKD
indexing strategy I found out that it doesn't support circle queries even so Prefix Tree supported it.
I didn't find a good solution yet to overcome this problem. Should we simply disallow using circles when ElasticSearch 8 is used?
Should we maybe transform circles into Polygones via Circle processor?
Related documentation about geo_shape
.
Some issues regarding this problem:
https://discuss.elastic.co/t/geo-shape-circle-query-not-supported/203436
https://discuss.elastic.co/t/circle-geo-shape-query-not-supported-in-es-8-1/300024
The failed tests regarding this issue can be found here: https://github.com/porunov/janusgraph/runs/6132913303?check_suite_focus=true#step:6:115
Caused by: org.janusgraph.diskstorage.PermanentBackendException: Unknown exception while executing index operation
Caused by: java.io.IOException: Failure(s) in Elasticsearch bulk request: [{type=mapper_parsing_exception, reason=failed to parse, caused_by={type=unsupported_operation_exception, reason=CIRCLE geometry is not supported}}]
Topic about circles: https://www.elastic.co/blog/this-week-in-elasticsearch-and-apache-lucene-2019-08-26
Here is the code they use to count number of sides in Polygon when transforming a circle: https://github.com/elastic/elasticsearch/blob/0699c9351f1439e246d408fd6538deafde4087b6/x-pack/plugin/spatial/src/main/java/org/elasticsearch/xpack/spatial/ingest/CircleProcessor.java#L139
Here is the actual class which transforms a number of sides + radius into a Polygon: https://github.com/elastic/elasticsearch/blob/0699c9351f1439e246d408fd6538deafde4087b6/x-pack/plugin/spatial/src/main/java/org/elasticsearch/xpack/spatial/SpatialUtils.java
I guess, there could be 2 options of using Circle Processor.
- Use it as it is. It will be fully managed by ElasticSearch. When we write
Circle
into ElasticSearch it will transform that Circle into a Polygon. At this point we use ElasticSearch to queryid
for indexed elements only. We never return actual data from ElasticSearch. Thus, we don't need to know for sure the initial type of that Geoshape in ElasticSearch because the initial type will be stored in a storage layer (i.e. Cassandra, HBase, etc.) and when we return it from Cassandra it will have a correctCircle
type instead of aPolygon
type (as stored in ElasticSearch). We have an open task to optimize data retrieval and return some data from ElasticSearch in the following task #1681, so later it might be useful to store some additional metadata to geo_shape fields (like a flag which tells us weather it's circle or not) but we can think about it later. Just for reference, we can use the next technique to store metadata in a separate field: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-field-meta.html . So, the advantage of this solution is that it's fully managed by ElasticSearch and we don't need to add a new dependency into JanusGraph to get access to the following code: https://github.com/elastic/elasticsearch/blob/0699c9351f1439e246d408fd6538deafde4087b6/x-pack/plugin/spatial/src/main/java/org/elasticsearch/xpack/spatial/SpatialUtils.java . The downside of this solution is that it looks like we initialize this Circle Processor only once with a fixederror_distance
(see docs) and it never changed. In case we want to dynamically changeerror_distance
by some custom strategy (for example, use smallerror_distance
for small circles and use bigerror_distance
for big circles to balance between accuracy and a size of a generated Polygon) then we don't have such opportunity. We may provide users to setup their preferrederror_distance
on the JanusGraph startup and that's it. It will be fixed for allgeo_shape
fields. - Add dependency which contains that part of code (not sure if there is available dependency with this code) and transform all Circles which use
BKD
index strategy into Polygons on the JanusGraph side during mutation process. The advantage is that we can introduce our custom strategies which dynamically changeerror_distance
depending on the data inserted. Basically, we will have more control about what is really going into ElasticSearch. A disadvantage is that we need to introduce an additional dependency into JanusGraph which have this code (I don't know yet what dependency contains this code) and we will also need to extend the conversion logic inElasticSearchIndex
to convert Circle into Polygon based on some rules.
General disadvantages of using Circle Processor (doesn't matter if that's solution 1 or 2):
- We lose transparency of what we actually store in ElasticSearch. People who try to query ElasticSearch directly will notice that instead of their Circle objects we store Polygon objects.
- Accuracy is worse than if we would use a normal Circle with Prefix Tree index strategy (that said, Prefix Tree index strategy isn't available in ElasticSearch starting from version 8, so there is no option of using it in the future).
- We need to balance between accuracy (small
error_distance
) and size of the generated Polygon (bigerror_distance
).
Would be great to hear other thoughts about other possible solutions or comments regarding these 2 solutions (using Circle Processor).
Opened a feature request issue in ElasticSearch to make that part of the code available via maven artifacts: https://github.com/elastic/elasticsearch/issues/86607