[FLINK-26203][Connectors / Pulsar][docs] Introduce Flink Pulsar SQL Connector
What is the purpose of the change
This PR is to introduce the basic Pulsar SQL Connector (both source and sink) and tests.
It is a big PR. Ideally it should be multiple PRs but for some reasons I kept it in one PR:
- The code is ready for source and sink, and they are not very very big changes.
- Splitting to source and sink is available, but since many tests requires both source and sink, keeping them together would make the testing easier.
- Some features make better sense when looking at both the source and sink, such as metadata, upsert, serialization/deserialization.
However such big PR is always hard to review. So to make the review process easier I tried to divided the PR to multiple smaller commits (each containing at most 4 or 5 files) according to their functionality. I wrote some notes in the last section which might be useful.
Brief change log
- Added PulsarTableFactory
- Adde PulsarTableSink and PulsarTableSource
- supports some VIRTUAL and NON VIRTUAL metadata mapping
- supports upsert mode code (not enabled by default) (though the upsert-pulsar is not included in this PR)
Verifying this change
This change added tests and can be verified as follows:
- Added integration tests for reading and writing to pulsar topics using different schema
- Added factory tests for verifying the source and sink sql connector is created as expected
- Added config related tests to make sure configs are validated and do not conflict
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): Yes
- The public API, i.e., is any changed class annotated with
@Public(Evolving): Yes - The runtime per-record code paths (performance sensitive): Yes
Documentation
Yes this PR involves documentation
Some notes
Here are some thought about this PR. any suggestions are greatly welcomed and it would be nice to have your thoughts~
- Currently this PR does not involve documentation. we want to create a different PR for SQL Connector documentation. WDYT ?
- The e2e test (SQL client e2e test) is not present in this PR as well. This is because we didn't have the SQL Client e2e test in our fork, and since it's new code, we want to create a new PR for e2e
- The naming of commits does not have a JIRA ticket to track. This is because I want to keep each commit as independent as possible during review process. Ideally we will rebase these commits and add JIRA number later.
There are 16 commits numbered from a to p ( I don't know why I didn't use numbers XD, a rush a blood to the head probably). I tried to make the commits as independent as possible.
- a to c: SQL Connector code
- d to g: Testing code
- h: packaging and manifest change. I am not sure if this part is correct because we did some hack on the pom when we were maintaining our own fork. Your input is important here ~
- i: doc and doc generation configs
- j to o: support SQL Connector involves some changes in the DataStream connector code base as well. I prefixed such commits with "DataStream"
Other commits are checkstyle or some later changes. For later fixing commits (after review), I'll use number plus the letter to track which commits the fixing points to .
CI report:
- 614d9259688bd8512ebd478ef487359a67343694 Azure: SUCCESS
Bot commands
The @flinkbot bot supports the following commands:@flinkbot run azurere-run the last Azure build
Thanks for your contribution. I'll review this ASAP.
With https://github.com/apache/flink-connector-pulsar/pull/1 getting close to get merged, let's not merge this PR but:
- Wait for the mentioned PR to be merged
- We'll release the Pulsar connector 3.0.0, which is the equivalent of the
release-1.16version. - We'll update the Pulsar connector with whatever changes that exist in
masterbut haven't been externalised yet. - Move this PR (and other Pulsar PRs) to the external connector repository so we can separately from the Flink release, release a new version of the Flink connector.