[FLINK-25756] [connectors/opensearch] Dedicated Opensearch connectors
Signed-off-by: Andriy Redko [email protected]
What is the purpose of the change
The goal of this change is to provide dedicated Opensearch connectors.
Brief change log
The implementation is largely based on the existing Elasticsearch 7 connector with a few notable changes (besides the dependencies and APIs):
- any mentions and uses of mapping types have been removed: it is deprecated feature, scheduled for removal (the indices with mapping types cannot be created or migrated to Opensearch 1.x and beyond)
- any mentions and uses have been removed: it is deprecated feature, scheduled for removal (only
HighLevelRestClientis used) - the default distributions of Opensearch come with HTTPS turned on, using self-signed certificates: to simplify the integration a new option
allow-insecurehas been added to suppress certificates validation for development and testing purposes - old streaming APIs are also supported to facilitate the migration of existing applications from Elasticsearch 7/6 to Opensearch (the classes will change but the familiar model will stay)
The new connector name is opensearch and it follows the existing conventions:
CREATE TABLE users ( ... ) WITH (
'connector' = 'opensearch',
'hosts' = 'https://localhost:9200',
'index' = 'users',
'allow-insecure' = 'true',
'username' = 'admin',
'password' = 'admin');
Verifying this change
This change added comprehensive tests and can be verified as follows (largely ported the existing unit and integration tests for Elasticsearch 7):
- Added unit tests
- Added integration tests for end-to-end
- Added end-to-end tests
- Manually verified the connector by running a node clusters
Does this pull request potentially affect one of the following parts:
- Dependencies: yes (the latest Opensearch 1.2.4 APIs as of this moment)
- The public API, i.e., is any changed class annotated with
@Public(Evolving): yes - The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
- The S3 file system connector: no
Documentation
- Does this pull request introduce a new feature? yes
- If yes, how is the feature documented? (docs - in progress, JavaDocs)
Huge thanks @snuyanzin for help.
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review.
Automated Checks
Last check on commit 95b34f9a3ff26cdbb2edc4250e494614287f5b5b (Thu Jan 27 15:31:50 UTC 2022)
Warnings:
- 4 pom.xml files were touched: Check for build and licensing issues.
- No documentation files were touched! Remember to keep the Flink docs up to date!
- This pull request references an unassigned Jira ticket. According to the code contribution guide, tickets need to be assigned before starting with the implementation work.
Mention the bot in a comment to re-run the automated checks.
Review Progress
- ❓ 1. The [description] looks good.
- ❓ 2. There is [consensus] that the contribution should go into to Flink.
- ❓ 3. Needs [attention] from.
- ❓ 4. The change fits into the overall [architecture].
- ❓ 5. Overall code [quality] is good.
Please see the Pull Request Review Guide for a full explanation of the review process.Bot commands
The @flinkbot bot supports the following commands:
@flinkbot approve descriptionto approve one or more aspects (aspects:description,consensus,architectureandquality)@flinkbot approve allto approve all aspects@flinkbot approve-until architectureto approve everything untilarchitecture@flinkbot attention @username1 [@username2 ..]to require somebody's attention@flinkbot disapprove architectureto remove an approval you gave earlier
CI report:
- 1f873a1f24a100cf6f69edd95d7e3fb89c6b5072 Azure: FAILURE
Bot commands
The @flinkbot bot supports the following commands:@flinkbot run azurere-run the last Azure build
Hi @reta Thanks for the contribution! I was wondering if you considered basing your implementation on the Generic Async Sink, new in 1.15?
Hi @reta Thanks for the contribution! I was wondering if you considered basing your implementation on the Generic Async Sink, new in 1.15?
Hey @CrynetLogistics , thanks a lot for pointing it out. No, we have not considered the Generic Async Sink API, it seems like it was introduced at the same time this pull request was created (but it sounds like a good future improvement for sure). Thank you!
@reta Thanks for your patience! We've started this week with our first external connector repo project, which is moving out the Elasticsearch connector from this repository to https://github.com/apache/flink-connector-elasticsearch
I think it would be best to first get that one moved out, so we can understand the actual issues that we might run into. When that one is done, I propose to create a dedicated repo for Opensearch and move your code to that repo. What do you think?
I think it would be best to first get that one moved out, so we can understand the actual issues that we might run into. When that one is done, I propose to create a dedicated repo for Opensearch and move your code to that repo. What do you think?
Sounds great, thank you @MartijnVisser ! Mind looking at https://github.com/apache/flink/pull/18634 before moving Elasticsearch repository (cleanup test dependencies)? Thank you!
@MartijnVisser I see that https://github.com/apache/flink-connector-elasticsearch is getting filled in, do you think I could re-target the pull request to this repository (or, alternatively, new one https://github.com/apache/flink-connector-opensearch could be created)? Thank you.
@MartijnVisser I see that https://github.com/apache/flink-connector-elasticsearch is getting filled in, do you think I could re-target the pull request to this repository (or, alternatively, new one https://github.com/apache/flink-connector-opensearch could be created)? Thank you.
Based on the latest discussion on the mailing list, we identified that if we want to create a new connector, we need to create a small FLIP (see https://cwiki.apache.org/confluence/display/FLINK/FLIP+Connector+Template). Do you have edit rights on the wiki? If not, I can ask a PMC to take care of that for you. After a FLIP discussion and a vote, we can create the repo directly. I think that makes the most sense. What do you think?
Based on the latest discussion on the mailing list, we identified that if we want to create a new connector, we need to create a small FLIP (see https://cwiki.apache.org/confluence/display/FLINK/FLIP+Connector+Template). Do you have edit rights on the wiki? If not, I can ask a PMC to take care of that for you. After a FLIP discussion and a vote, we can create the repo directly. I think that makes the most sense. What do you think?
Thanks @MartijnVisser , sure, I will do FLIP and follow the process. I don't have write permissions for Apache Flink space to create the new page for connector, I would really appreciate your help (same ASF username as on Github), thank you!
FLIP has been accepted: https://cwiki.apache.org/confluence/display/FLINK/FLIP-243%3A+Dedicated+Opensearch+connectors
@reta Could you sync this PR to use the same setup as https://github.com/apache/flink-connector-elasticsearch ? Especially the Parent POM setup + the CI stuff.
@reta Could you sync this PR to use the same setup as https://github.com/apache/flink-connector-elasticsearch ? Especially the Parent POM setup + the CI stuff.
Will do, thanks @MartijnVisser !
@MartijnVisser I am closing this one in favor of https://github.com/apache/flink-connector-opensearch/pull/1