flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-25756] [connectors/opensearch] Dedicated Opensearch connectors

Open reta opened this issue 3 years ago • 9 comments

Signed-off-by: Andriy Redko [email protected]

What is the purpose of the change

The goal of this change is to provide dedicated Opensearch connectors.

Brief change log

The implementation is largely based on the existing Elasticsearch 7 connector with a few notable changes (besides the dependencies and APIs):

  • any mentions and uses of mapping types have been removed: it is deprecated feature, scheduled for removal (the indices with mapping types cannot be created or migrated to Opensearch 1.x and beyond)
  • any mentions and uses have been removed: it is deprecated feature, scheduled for removal (only HighLevelRestClient is used)
  • the default distributions of Opensearch come with HTTPS turned on, using self-signed certificates: to simplify the integration a new option allow-insecure has been added to suppress certificates validation for development and testing purposes
  • old streaming APIs are also supported to facilitate the migration of existing applications from Elasticsearch 7/6 to Opensearch (the classes will change but the familiar model will stay)

The new connector name is opensearch and it follows the existing conventions:

CREATE TABLE users ( ... ) WITH (
  'connector' = 'opensearch', 
  'hosts' = 'https://localhost:9200',
  'index' = 'users', 
  'allow-insecure' = 'true', 
  'username' = 'admin', 
  'password' = 'admin');

Verifying this change

This change added comprehensive tests and can be verified as follows (largely ported the existing unit and integration tests for Elasticsearch 7):

  • Added unit tests
  • Added integration tests for end-to-end
  • Added end-to-end tests
  • Manually verified the connector by running a node clusters

Does this pull request potentially affect one of the following parts:

  • Dependencies: yes (the latest Opensearch 1.2.4 APIs as of this moment)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): yes
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? (docs - in progress, JavaDocs)

Huge thanks @snuyanzin for help.

reta avatar Jan 27 '22 15:01 reta

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 95b34f9a3ff26cdbb2edc4250e494614287f5b5b (Thu Jan 27 15:31:50 UTC 2022)

Warnings:

  • 4 pom.xml files were touched: Check for build and licensing issues.
  • No documentation files were touched! Remember to keep the Flink docs up to date!
  • This pull request references an unassigned Jira ticket. According to the code contribution guide, tickets need to be assigned before starting with the implementation work.

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

flinkbot avatar Jan 27 '22 15:01 flinkbot

CI report:

  • 1f873a1f24a100cf6f69edd95d7e3fb89c6b5072 Azure: FAILURE
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Jan 27 '22 15:01 flinkbot

Hi @reta Thanks for the contribution! I was wondering if you considered basing your implementation on the Generic Async Sink, new in 1.15?

CrynetLogistics avatar Feb 01 '22 14:02 CrynetLogistics

Hi @reta Thanks for the contribution! I was wondering if you considered basing your implementation on the Generic Async Sink, new in 1.15?

Hey @CrynetLogistics , thanks a lot for pointing it out. No, we have not considered the Generic Async Sink API, it seems like it was introduced at the same time this pull request was created (but it sounds like a good future improvement for sure). Thank you!

reta avatar Feb 01 '22 15:02 reta

@reta Thanks for your patience! We've started this week with our first external connector repo project, which is moving out the Elasticsearch connector from this repository to https://github.com/apache/flink-connector-elasticsearch

I think it would be best to first get that one moved out, so we can understand the actual issues that we might run into. When that one is done, I propose to create a dedicated repo for Opensearch and move your code to that repo. What do you think?

MartijnVisser avatar Mar 29 '22 07:03 MartijnVisser

I think it would be best to first get that one moved out, so we can understand the actual issues that we might run into. When that one is done, I propose to create a dedicated repo for Opensearch and move your code to that repo. What do you think?

Sounds great, thank you @MartijnVisser ! Mind looking at https://github.com/apache/flink/pull/18634 before moving Elasticsearch repository (cleanup test dependencies)? Thank you!

reta avatar Mar 29 '22 12:03 reta

@MartijnVisser I see that https://github.com/apache/flink-connector-elasticsearch is getting filled in, do you think I could re-target the pull request to this repository (or, alternatively, new one https://github.com/apache/flink-connector-opensearch could be created)? Thank you.

reta avatar May 26 '22 16:05 reta

@MartijnVisser I see that https://github.com/apache/flink-connector-elasticsearch is getting filled in, do you think I could re-target the pull request to this repository (or, alternatively, new one https://github.com/apache/flink-connector-opensearch could be created)? Thank you.

Based on the latest discussion on the mailing list, we identified that if we want to create a new connector, we need to create a small FLIP (see https://cwiki.apache.org/confluence/display/FLINK/FLIP+Connector+Template). Do you have edit rights on the wiki? If not, I can ask a PMC to take care of that for you. After a FLIP discussion and a vote, we can create the repo directly. I think that makes the most sense. What do you think?

MartijnVisser avatar May 30 '22 14:05 MartijnVisser

Based on the latest discussion on the mailing list, we identified that if we want to create a new connector, we need to create a small FLIP (see https://cwiki.apache.org/confluence/display/FLINK/FLIP+Connector+Template). Do you have edit rights on the wiki? If not, I can ask a PMC to take care of that for you. After a FLIP discussion and a vote, we can create the repo directly. I think that makes the most sense. What do you think?

Thanks @MartijnVisser , sure, I will do FLIP and follow the process. I don't have write permissions for Apache Flink space to create the new page for connector, I would really appreciate your help (same ASF username as on Github), thank you!

reta avatar May 30 '22 16:05 reta

FLIP has been accepted: https://cwiki.apache.org/confluence/display/FLINK/FLIP-243%3A+Dedicated+Opensearch+connectors

MartijnVisser avatar Nov 14 '22 14:11 MartijnVisser

@reta Could you sync this PR to use the same setup as https://github.com/apache/flink-connector-elasticsearch ? Especially the Parent POM setup + the CI stuff.

MartijnVisser avatar Nov 14 '22 14:11 MartijnVisser

@reta Could you sync this PR to use the same setup as https://github.com/apache/flink-connector-elasticsearch ? Especially the Parent POM setup + the CI stuff.

Will do, thanks @MartijnVisser !

reta avatar Nov 14 '22 14:11 reta

@MartijnVisser I am closing this one in favor of https://github.com/apache/flink-connector-opensearch/pull/1

reta avatar Nov 14 '22 19:11 reta