flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-26203][Connectors / Pulsar][docs] Introduce Flink Pulsar SQL Connector

Open imaffe opened this issue 3 years ago • 2 comments

What is the purpose of the change

This PR is to introduce the basic Pulsar SQL Connector (both source and sink) and tests.

It is a big PR. Ideally it should be multiple PRs but for some reasons I kept it in one PR:

  • The code is ready for source and sink, and they are not very very big changes.
  • Splitting to source and sink is available, but since many tests requires both source and sink, keeping them together would make the testing easier.
  • Some features make better sense when looking at both the source and sink, such as metadata, upsert, serialization/deserialization.

However such big PR is always hard to review. So to make the review process easier I tried to divided the PR to multiple smaller commits (each containing at most 4 or 5 files) according to their functionality. I wrote some notes in the last section which might be useful.

Brief change log

  • Added PulsarTableFactory
  • Adde PulsarTableSink and PulsarTableSource
  • supports some VIRTUAL and NON VIRTUAL metadata mapping
  • supports upsert mode code (not enabled by default) (though the upsert-pulsar is not included in this PR)

Verifying this change

This change added tests and can be verified as follows:

  • Added integration tests for reading and writing to pulsar topics using different schema
  • Added factory tests for verifying the source and sink sql connector is created as expected
  • Added config related tests to make sure configs are validated and do not conflict

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): Yes
  • The public API, i.e., is any changed class annotated with @Public(Evolving): Yes
  • The runtime per-record code paths (performance sensitive): Yes

Documentation

Yes this PR involves documentation

Some notes

Here are some thought about this PR. any suggestions are greatly welcomed and it would be nice to have your thoughts~

  • Currently this PR does not involve documentation. we want to create a different PR for SQL Connector documentation. WDYT ?
  • The e2e test (SQL client e2e test) is not present in this PR as well. This is because we didn't have the SQL Client e2e test in our fork, and since it's new code, we want to create a new PR for e2e
  • The naming of commits does not have a JIRA ticket to track. This is because I want to keep each commit as independent as possible during review process. Ideally we will rebase these commits and add JIRA number later.

There are 16 commits numbered from a to p ( I don't know why I didn't use numbers XD, a rush a blood to the head probably). I tried to make the commits as independent as possible.

  • a to c: SQL Connector code
  • d to g: Testing code
  • h: packaging and manifest change. I am not sure if this part is correct because we did some hack on the pom when we were maintaining our own fork. Your input is important here ~
  • i: doc and doc generation configs
  • j to o: support SQL Connector involves some changes in the DataStream connector code base as well. I prefixed such commits with "DataStream"

Other commits are checkstyle or some later changes. For later fixing commits (after review), I'll use number plus the letter to track which commits the fixing points to .

imaffe avatar Sep 22 '22 11:09 imaffe

CI report:

  • 614d9259688bd8512ebd478ef487359a67343694 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Sep 22 '22 11:09 flinkbot

Thanks for your contribution. I'll review this ASAP.

syhily avatar Sep 23 '22 06:09 syhily

With https://github.com/apache/flink-connector-pulsar/pull/1 getting close to get merged, let's not merge this PR but:

  1. Wait for the mentioned PR to be merged
  2. We'll release the Pulsar connector 3.0.0, which is the equivalent of the release-1.16 version.
  3. We'll update the Pulsar connector with whatever changes that exist in master but haven't been externalised yet.
  4. Move this PR (and other Pulsar PRs) to the external connector repository so we can separately from the Flink release, release a new version of the Flink connector.

MartijnVisser avatar Nov 29 '22 16:11 MartijnVisser