opensearch-hadoop
opensearch-hadoop copied to clipboard
[FEATURE] Add OpenSearch-Java transport as an option
Is your feature request related to a problem?
Currently the hadoop client uses it’s own custom RestClient to make requests to an OpenSearch cluster. While this works today, we’d like to allow users to choose between that and an optional ApacheHttpClient5Transport that’s present in the OpenSearch-Java client. To do this, we would need to add a new transport option and import the java client into hadoop.
Adding the OpenSearch-Java transport as an option allows us to build on top of the upstream client and allows us to and new features like backpressure awareness and others described in https://github.com/opensearch-project/opensearch-clients/issues/27
What solution would you like?
This diagram illustrates how the java client and hadoop client make requests to OpenSearch today:

The hadoop client just translates incoming queries into a SimpleHTTP request and uses the URL, path, queryparameter, method, and headers to construct the request. OpenSearch-Java on the other hand makes use of a Request/Response builder pattern and exposes those methods to the client.
Approach 1
One approach would be to add a new abstraction layer in OpenSearch-Java that would abstract away the GET/POST/PUT/DELETE methods. The hadoop client can just use the appropriate class, construct the request and let the java client handle everything else. This is also an ask here — https://github.com/opensearch-project/opensearch-java/issues/377

Pros:
- Would make it easier in the hadoop client to leverage OpenSearch-Java without having to explicitly parse the Request and Response classes
- Would let people use the client as is for exisiting APIs and allow them to hit other OpenSearch endpoints that are not supported in the client today, e.g. calling plugin APIs
Cons:
- Providing an easy REST layer can shadow the other client endpoints and cause developers to stop using them the right way adding maintenance overhead and potentially going against good design patterns?
Approach 2
Parse the incoming request at the hadoop layer and use the appropriate OpenSearch-Java request and response class to send the request.
Implementation questions:
- what’s a good design pattern for this?
Example design pattern:
public class JavaClientTransport {
public void executeRequest(Request.Method method, CharSequence uri, CharSequence path, CharSequence params, ByteSequence body, String operationType) throws Exception {
RestClientBuilder builder = RestClient.builder(new HttpHost("endpoint", 9200, "protocol"));
RestClient restClient = builder.build();
// Create Client
OpenSearchTransport transport = new RestClientTransport(restClient, new JacksonJsonpMapper());
OpenSearchClient client = new OpenSearchClient(transport);
switch(operationType) {
case "bulk":
// build the request right here, looks like we have all the information needed, it might need to be converted to specific format that fits with the BulkRequest
BulkRequest bulkRequest = new BulkRequest.Builder().index("index").build();
BulkResponse bulkResponse = client.bulk(bulkRequest);
case "search":
SearchRequest searchRequest = new SearchRequest.Builder().index(Arrays.asList("index")).build();
SearchResponse searchResponse = client.search(searchRequest, null);
default:
throw new Exception("No matching path found");
}
}
}
- how would we convert the
Bytesequencethat hadoop uses for the body into either JSON or a body class?
Potential example to convert ByteSequence to String
ByteArrayOutputStream result = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
for (int length; (length = inputStream.read(buffer)) != -1; ) {
result.write(buffer, 0, length);
}
return result.toString("UTF-8");
Potential example to convert Json string to required JsonData that .document() from the Java Client accepts.
JsonpMapper mapper = client._transport().jsonpMapper();
JsonParser parser = mapper.jsonProvider().createParser(new StringReader(jsonString));
JsonData data = JsonData.from(parser, mapper);
A bigger question is, what have I missed in the implementation of the Java client and the hadoop client that would require a third approach?
Do you have any additional context?
This is also a feature request in https://github.com/opensearch-project/spring-data-opensearch/issues/19 and can help consolidate the approaches.
@wbeckler @VachaShah @nknize @dblock would love any feedback.
I'll dig deeper but my initial reaction would be to refactor the java client transport as a core library so we take the dependency on opensearch-core and a new opensearch-transport library instead of a cross plugin dependency.
I think opensearch-java needs the ability and expose doing pure HTTP requests to avoid being a bottlebeck, and all the implementations of actual strongly typed methods should use those. For this client, taking a dependency on opensearch-java seems like the right call.
@harshavamsi what is preventing us to use the opensearch-java as intended, using typed request and response models? (I think that what you meant in approach #2).
@harshavamsi what is preventing us to use the
opensearch-javaas intended, using typed request and response models? (I think that what you meant in approach #2).
Yes, I wasn't very sure if we should be using typed request/responses given that the hadoop client today does not have any way of determining the types of API calls that are made. Based on the comments from https://github.com/opensearch-project/opensearch-java/issues/377, I think it's fair on both clients to have this feature. This makes it much easier to implement the client here. What were you think about doing in https://github.com/opensearch-project/spring-data-opensearch/issues/19? Were you going to pull in the request/response types from opensearch-java?
What were you think about doing in https://github.com/opensearch-project/spring-data-opensearch/issues/19? Were you going to pull in the request/response types from
opensearch-java?
Yes, the plan going forward is to recommend opensearch-java as the only official client to communicate with OpenSearch, I think we formalized it here [1]
[1] https://github.com/opensearch-project/OpenSearch/issues/5424