flink-connector-kafka
flink-connector-kafka copied to clipboard
[FLINK-32522][connectors/kafka] Kafka connector should depend on commons-collections instead of inheriting from flink
What is the purpose of the change
As this thread https://lists.apache.org/thread/l98pc18onxrcrsb01x5kh1vppl7ymk2d discussed. Connectors shouldn't rely on dependencies that may or may not be available in Flink itself. But currently kafka connector use commons-collections from flink, we should depend on commons-collections and bundle it in shaded-jar. otherwise, connector will be vulnerable and even block upgrading of flink.
Brief change log
Add commons-collections in pom
Verifying this change
This change is a trivial rework / code cleanup. Verifying by current total cases.
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): yes
- The public API, i.e., is any changed class annotated with
@Public(Evolving)
: no - The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
- The S3 file system connector: no
Documentation
- Does this pull request introduce a new feature? no
- If yes, how is the feature documented? not applicable
@MartijnVisser @XComp Hi, guys. PTAL. thanks
@MartijnVisser hi, Martijn. Can you help to review it?
@chucheng92 Looking at the dependency tree, there's still commons-collections coming in via Flink itself. If we want to remove the dependency, we should also make sure that it's not included from there
Yes. I have updated the PR.
old commons-collections in kafka connector is inherited from flink (so we will see it in kafka). This is a circular dependency and we should break it.
What I am thinking about now is to change the kafka connector to commons-collections4. When the kafka connector is released, upgrade the kafka-connector that Python depends on to this version on the flink side. In this way, it is decoupled, and the flink side can also be upgraded to commons-collections4. So the https://github.com/apache/flink/pull/21442 can be resolved. And it will remove all the old commons-collections from flink side.
@chucheng92 Looking at the dependency tree, there's still commons-collections coming in via Flink itself. If we want to remove the dependency, we should also make sure that it's not included from there
Of course, the more important point is that pyflink should not rely on connectors, even for integration tests