opensearch-hadoop [FEATURE] Add OpenSearch-Java transport as an option

trafficstars

Is your feature request related to a problem?

Currently the hadoop client uses it’s own custom RestClient to make requests to an OpenSearch cluster. While this works today, we’d like to allow users to choose between that and an optional ApacheHttpClient5Transport that’s present in the OpenSearch-Java client. To do this, we would need to add a new transport option and import the java client into hadoop.

Adding the OpenSearch-Java transport as an option allows us to build on top of the upstream client and allows us to and new features like backpressure awareness and others described in https://github.com/opensearch-project/opensearch-clients/issues/27

What solution would you like?

This diagram illustrates how the java client and hadoop client make requests to OpenSearch today:

Untitled Diagram drawio(2)

The hadoop client just translates incoming queries into a SimpleHTTP request and uses the URL, path, queryparameter, method, and headers to construct the request. OpenSearch-Java on the other hand makes use of a Request/Response builder pattern and exposes those methods to the client.

Approach 1

One approach would be to add a new abstraction layer in OpenSearch-Java that would abstract away the GET/POST/PUT/DELETE methods. The hadoop client can just use the appropriate class, construct the request and let the java client handle everything else. This is also an ask here — https://github.com/opensearch-project/opensearch-java/issues/377

Untitled Diagram drawio(4)

Pros:

Would make it easier in the hadoop client to leverage OpenSearch-Java without having to explicitly parse the Request and Response classes
Would let people use the client as is for exisiting APIs and allow them to hit other OpenSearch endpoints that are not supported in the client today, e.g. calling plugin APIs

Cons:

Providing an easy REST layer can shadow the other client endpoints and cause developers to stop using them the right way adding maintenance overhead and potentially going against good design patterns?

Approach 2

Parse the incoming request at the hadoop layer and use the appropriate OpenSearch-Java request and response class to send the request.

Implementation questions:

what’s a good design pattern for this?

Example design pattern:

public class JavaClientTransport {
    
    public void executeRequest(Request.Method method, CharSequence uri, CharSequence path, CharSequence params, ByteSequence body, String operationType) throws Exception {
        RestClientBuilder builder = RestClient.builder(new HttpHost("endpoint", 9200, "protocol"));
        RestClient restClient = builder.build();

        // Create Client
        OpenSearchTransport transport = new RestClientTransport(restClient, new JacksonJsonpMapper());
        OpenSearchClient client = new OpenSearchClient(transport);
        switch(operationType) {
            case "bulk":
                // build the request right here, looks like we have all the information needed, it might need to be converted to specific format that fits with the BulkRequest
                BulkRequest bulkRequest = new BulkRequest.Builder().index("index").build();
                BulkResponse bulkResponse = client.bulk(bulkRequest);
            case "search":
                SearchRequest searchRequest = new SearchRequest.Builder().index(Arrays.asList("index")).build();
                SearchResponse searchResponse = client.search(searchRequest, null);
            default:
                throw new Exception("No matching path found");
        }
    }
}

how would we convert the Bytesequence that hadoop uses for the body into either JSON or a body class?

Potential example to convert ByteSequence to String

 ByteArrayOutputStream result = new ByteArrayOutputStream();
 byte[] buffer = new byte[1024];
 for (int length; (length = inputStream.read(buffer)) != -1; ) {
     result.write(buffer, 0, length);
 }
 return result.toString("UTF-8");

Potential example to convert Json string to required JsonData that .document() from the Java Client accepts.

JsonpMapper mapper = client._transport().jsonpMapper();
JsonParser parser = mapper.jsonProvider().createParser(new StringReader(jsonString));
JsonData data = JsonData.from(parser, mapper);

A bigger question is, what have I missed in the implementation of the Java client and the hadoop client that would require a third approach?

Do you have any additional context?

This is also a feature request in https://github.com/opensearch-project/spring-data-opensearch/issues/19 and can help consolidate the approaches.

Feb 24 '23 17:02 harshavamsi

@wbeckler @VachaShah @nknize @dblock would love any feedback.

Feb 24 '23 17:02 harshavamsi

I'll dig deeper but my initial reaction would be to refactor the java client transport as a core library so we take the dependency on opensearch-core and a new opensearch-transport library instead of a cross plugin dependency.

Feb 24 '23 22:02 nknize

I think opensearch-java needs the ability and expose doing pure HTTP requests to avoid being a bottlebeck, and all the implementations of actual strongly typed methods should use those. For this client, taking a dependency on opensearch-java seems like the right call.

Feb 28 '23 20:02 dblock

@harshavamsi what is preventing us to use the opensearch-java as intended, using typed request and response models? (I think that what you meant in approach #2).

Mar 07 '23 18:03 reta

@harshavamsi what is preventing us to use the opensearch-java as intended, using typed request and response models? (I think that what you meant in approach #2).

Yes, I wasn't very sure if we should be using typed request/responses given that the hadoop client today does not have any way of determining the types of API calls that are made. Based on the comments from https://github.com/opensearch-project/opensearch-java/issues/377, I think it's fair on both clients to have this feature. This makes it much easier to implement the client here. What were you think about doing in https://github.com/opensearch-project/spring-data-opensearch/issues/19? Were you going to pull in the request/response types from opensearch-java?

Mar 10 '23 18:03 harshavamsi

What were you think about doing in https://github.com/opensearch-project/spring-data-opensearch/issues/19? Were you going to pull in the request/response types from opensearch-java?

Yes, the plan going forward is to recommend opensearch-java as the only official client to communicate with OpenSearch, I think we formalized it here [1]

[1] https://github.com/opensearch-project/OpenSearch/issues/5424

Mar 10 '23 18:03 reta

opensearch-hadoop opensearch-hadoop copied to clipboard

[FEATURE] Add OpenSearch-Java transport as an option

Is your feature request related to a problem?

What solution would you like?

Approach 1

Approach 2

Do you have any additional context?

opensearch-hadoop
opensearch-hadoop copied to clipboard