flink-connector-kafka icon indicating copy to clipboard operation
flink-connector-kafka copied to clipboard

[FLINK-32522][connectors/kafka] Kafka connector should depend on commons-collections instead of inheriting from flink

Open chucheng92 opened this issue 1 year ago • 4 comments

What is the purpose of the change

As this thread https://lists.apache.org/thread/l98pc18onxrcrsb01x5kh1vppl7ymk2d discussed. Connectors shouldn't rely on dependencies that may or may not be available in Flink itself. But currently kafka connector use commons-collections from flink, we should depend on commons-collections and bundle it in shaded-jar. otherwise, connector will be vulnerable and even block upgrading of flink.

Brief change log

Add commons-collections in pom

Verifying this change

This change is a trivial rework / code cleanup. Verifying by current total cases.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): yes
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

chucheng92 avatar Jul 03 '23 13:07 chucheng92

@MartijnVisser @XComp Hi, guys. PTAL. thanks

chucheng92 avatar Jul 04 '23 10:07 chucheng92

@MartijnVisser hi, Martijn. Can you help to review it?

chucheng92 avatar Jul 10 '23 11:07 chucheng92

@chucheng92 Looking at the dependency tree, there's still commons-collections coming in via Flink itself. If we want to remove the dependency, we should also make sure that it's not included from there

Yes. I have updated the PR.

old commons-collections in kafka connector is inherited from flink (so we will see it in kafka). This is a circular dependency and we should break it.

What I am thinking about now is to change the kafka connector to commons-collections4. When the kafka connector is released, upgrade the kafka-connector that Python depends on to this version on the flink side. In this way, it is decoupled, and the flink side can also be upgraded to commons-collections4. So the https://github.com/apache/flink/pull/21442 can be resolved. And it will remove all the old commons-collections from flink side.

chucheng92 avatar Oct 11 '23 10:10 chucheng92

@chucheng92 Looking at the dependency tree, there's still commons-collections coming in via Flink itself. If we want to remove the dependency, we should also make sure that it's not included from there

Of course, the more important point is that pyflink should not rely on connectors, even for integration tests

chucheng92 avatar Oct 11 '23 10:10 chucheng92