janusgraph icon indicating copy to clipboard operation
janusgraph copied to clipboard

Add support for ElasticSearch 8

Open porunov opened this issue 2 years ago • 4 comments

ElasticSearch 8 was released a while ago (February 10, 2022). It's now safe to add support for ElasticSearch 8 into JanusGraph.

porunov avatar Apr 22 '22 11:04 porunov

In ElasticSearch 8 Prefix Tree index strategy isn't supported for geo_shape anymore. Only BKD index strategy is supported. Playing with BKD indexing strategy I found out that it doesn't support circle queries even so Prefix Tree supported it. I didn't find a good solution yet to overcome this problem. Should we simply disallow using circles when ElasticSearch 8 is used? Should we maybe transform circles into Polygones via Circle processor? Related documentation about geo_shape. Some issues regarding this problem: https://discuss.elastic.co/t/geo-shape-circle-query-not-supported/203436 https://discuss.elastic.co/t/circle-geo-shape-query-not-supported-in-es-8-1/300024

The failed tests regarding this issue can be found here: https://github.com/porunov/janusgraph/runs/6132913303?check_suite_focus=true#step:6:115

Caused by: org.janusgraph.diskstorage.PermanentBackendException: Unknown exception while executing index operation
Caused by: java.io.IOException: Failure(s) in Elasticsearch bulk request: [{type=mapper_parsing_exception, reason=failed to parse, caused_by={type=unsupported_operation_exception, reason=CIRCLE geometry is not supported}}]

Topic about circles: https://www.elastic.co/blog/this-week-in-elasticsearch-and-apache-lucene-2019-08-26

porunov avatar Apr 22 '22 19:04 porunov

Here is the code they use to count number of sides in Polygon when transforming a circle: https://github.com/elastic/elasticsearch/blob/0699c9351f1439e246d408fd6538deafde4087b6/x-pack/plugin/spatial/src/main/java/org/elasticsearch/xpack/spatial/ingest/CircleProcessor.java#L139

Here is the actual class which transforms a number of sides + radius into a Polygon: https://github.com/elastic/elasticsearch/blob/0699c9351f1439e246d408fd6538deafde4087b6/x-pack/plugin/spatial/src/main/java/org/elasticsearch/xpack/spatial/SpatialUtils.java

porunov avatar Apr 22 '22 22:04 porunov

I guess, there could be 2 options of using Circle Processor.

  1. Use it as it is. It will be fully managed by ElasticSearch. When we write Circle into ElasticSearch it will transform that Circle into a Polygon. At this point we use ElasticSearch to query id for indexed elements only. We never return actual data from ElasticSearch. Thus, we don't need to know for sure the initial type of that Geoshape in ElasticSearch because the initial type will be stored in a storage layer (i.e. Cassandra, HBase, etc.) and when we return it from Cassandra it will have a correct Circle type instead of a Polygon type (as stored in ElasticSearch). We have an open task to optimize data retrieval and return some data from ElasticSearch in the following task #1681, so later it might be useful to store some additional metadata to geo_shape fields (like a flag which tells us weather it's circle or not) but we can think about it later. Just for reference, we can use the next technique to store metadata in a separate field: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-field-meta.html . So, the advantage of this solution is that it's fully managed by ElasticSearch and we don't need to add a new dependency into JanusGraph to get access to the following code: https://github.com/elastic/elasticsearch/blob/0699c9351f1439e246d408fd6538deafde4087b6/x-pack/plugin/spatial/src/main/java/org/elasticsearch/xpack/spatial/SpatialUtils.java . The downside of this solution is that it looks like we initialize this Circle Processor only once with a fixed error_distance (see docs) and it never changed. In case we want to dynamically change error_distance by some custom strategy (for example, use small error_distance for small circles and use big error_distance for big circles to balance between accuracy and a size of a generated Polygon) then we don't have such opportunity. We may provide users to setup their preferred error_distance on the JanusGraph startup and that's it. It will be fixed for all geo_shape fields.
  2. Add dependency which contains that part of code (not sure if there is available dependency with this code) and transform all Circles which use BKD index strategy into Polygons on the JanusGraph side during mutation process. The advantage is that we can introduce our custom strategies which dynamically change error_distance depending on the data inserted. Basically, we will have more control about what is really going into ElasticSearch. A disadvantage is that we need to introduce an additional dependency into JanusGraph which have this code (I don't know yet what dependency contains this code) and we will also need to extend the conversion logic in ElasticSearchIndex to convert Circle into Polygon based on some rules.

General disadvantages of using Circle Processor (doesn't matter if that's solution 1 or 2):

  • We lose transparency of what we actually store in ElasticSearch. People who try to query ElasticSearch directly will notice that instead of their Circle objects we store Polygon objects.
  • Accuracy is worse than if we would use a normal Circle with Prefix Tree index strategy (that said, Prefix Tree index strategy isn't available in ElasticSearch starting from version 8, so there is no option of using it in the future).
  • We need to balance between accuracy (small error_distance) and size of the generated Polygon (big error_distance).

Would be great to hear other thoughts about other possible solutions or comments regarding these 2 solutions (using Circle Processor).

porunov avatar Apr 22 '22 23:04 porunov

Opened a feature request issue in ElasticSearch to make that part of the code available via maven artifacts: https://github.com/elastic/elasticsearch/issues/86607

porunov avatar May 10 '22 12:05 porunov