spring-data-elasticsearch icon indicating copy to clipboard operation
spring-data-elasticsearch copied to clipboard

Support the Task API

Open sothawo opened this issue 2 years ago • 2 comments

Spring Data Elasticsearch should add support for the Task API (https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html):

  • add methods to the ClusterOperations according to the ES API
  • add methods like `submitDelete(Query) (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query-task-api) that create tasks

sothawo avatar Aug 26 '21 15:08 sothawo

Could we talk a little bit about this?

add methods to the ClusterOperations according to the ES API

What methods you mean? Methods which returns all active tasks with details?

add methods like submitDelete(Query)

This methods should be placed in DocumentOperations I guess?

What should be shape of Task class? I think could be something like this:

class Task {
    private String node;
    private List<ChildTask> childTasks;
    // and a few more properties I think
}
class ChildTask {
    private TaskId taskId; //org.elasticsearch.tasks.TaskId
    // and a few more properties I think
}

And I would talk about how someone should check if task is done. Or how can I cancel particular task? I mean high level usage of this feature.

piotrooo avatar Nov 09 '21 07:11 piotrooo

I just had a look at the code of the RestHighLevelClient in version 7.15.1. There the tasks API is implemented in an own TasksClient(https://github.com/elastic/elasticsearch/blob/master/client/rest-high-level/src/main/java/org/elasticsearch/client/TasksClient.java). Therefore in Spring Data Elasticsearch this should be defined in a (Reactive)TasksOperations interface and implemented in (Reactive)TasksTemplate classes that integrate with the existing implementations.

As to what properties should be in a task, this should reflect the data that is available in the org.elasticsearch.client.tasks package of the RestHighLevelClient.

But I currently see two problems.

  1. The first is that the tasks API is still marked as beta feature (https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html).
  2. The tasks API was introduced into the client code after version 7.10. This means that it is not licensed with the Apache2 license. While we can use the classes from the RestHighLevelClient to access that API, we cannot easily implement this in the reactive code. For the reactive code, Spring Data Elasticsearch contains modified copies of some classes from the core Elasticsearch libraries for creating request objects and converting response objects. It was no problem to copy and modify that code when we implemented the reactive code, but we cannot take the new parts needed for the tasks API like before, because these now have a different license.

Elasticsearch works on providing a new client (https://github.com/elastic/elasticsearch-java) and I am currently working on integrating this client as first an alternative and later as replacement to the RestHighLevelClient. The request and response classes from this new client are Apache2 licensed and Spring Data Elasticsearch will use these for both imperative and reactive code.

So I would defer implementing the tasks API in Spring Data Elasticsearch to the integration of the new client, I would not want to have it implemented only for the imperative but not the reactive code.

sothawo avatar Nov 09 '21 20:11 sothawo

In the current 8.12 version this is still a feature in beta status. I'll close this issue for now. When the task api is stable we should create a new issue then.

sothawo avatar Feb 17 '24 16:02 sothawo