nifi
nifi copied to clipboard
NIFI-11259 - Kafka processor refactor
Background
Previous iterations of support for Kafka client versions in NiFi (1.0, 2.0, 2.6) duplicated code from existing Kafka processors into new Maven modules, adjusted Kafka client library dependencies for the new modules, and adjusted for API differences as needed. The original JIRA associated with NiFi support for Kafka 3 (NIFI-9330
), followed this same approach. After discussion, a new approach was chosen, and a new JIRA (NIFI-11259
) created.
- Refactor Kafka client library dependencies into a new controller service.
- Expose a service API that was agnostic of any particular Kafka version.
- Create new processor implementations that interacted with Kafka through the service API.
In particular, the new processor module should have no Kafka dependencies. This is expected to ease the transition to future Kafka versions.
- A new 3.0++ controller service might be created to isolate any major API changes to the Kafka client APIs.
- The new
PublishKafka
andConsumeKafka
processors would need minimal / no changes to enable interactivity with the new controller service.
Other refactoring activities have been undertaken at the same time:
- The new
PublishKafka
processor is intended as a replacement for the existingPublishKafka_2_6
andPublishKafkaRecord_2_6
processor pair. It provides FlowFile-based or record-based data handling modes, controlled via a per-processor property. Similarly,ConsumeKafka
replaces bothConsumeKafka_2_6
andConsumeKafkaRecord_2_6
. This design adjustment reduces the code duplication that was present in the 2.6 processor set.
New Modules
-
nifi-kafka-service-api
- API contract forKafkaConnectionService
, which exposes access to instances ofKafkaConsumerService
andKafkaProducerService
in a manner agnostic to a particular version of Kafka-
KafkaProducerService
- intermediary logical service brokering interactions with the producer APIs of the Kafka client libraries -
KafkaConsumerService
- intermediary logical service brokering interactions with the producer APIs of the Kafka client libraries
-
-
nifi-kafka-service-api-nar
- NiFi NAR wrapper for theKafkaConnectionService
API contract -
nifi-kafka-3-service
- home forKafka3ConnectionService
, which abstracts Kafka dependencies away from the new Kafka processors, and manages runtime connections to a remote Kafka 3 service instance -
nifi-kafka-3-service-nar
- NiFi NAR wrapper forKafka3ConnectionService
-
nifi-kafka-processors
- home forPublishKafka
andConsumeKafka
processors, which allow interactivity with remote Kafka service instances agnostic to a particular Kafka version -
nifi-kafka-nar
- NiFi NAR wrapper for thePublishKafka
andConsumeKafka
processors -
nifi-kafka-2-6-integration
- test bed to establish runtime behavior (testcontainers/kafka) of Kafka 2.6 processors for certain conditions, in order to better replicate those behaviors -
nifi-kafka-3-integration
- testing infrastructure to test new processors / controller service while communicating with an actual (testcontainers) Kafka instance -
nifi-kafka-jacoco
- module home for configuration to instrument executions ofnifi-kafka-bundle
unit tests / integration tests, in order to assess test coverage
Notes
- Integration tests are employed as a "first-class citizen" means of testing expected interactions with Kafka instances, running in
testcontainers
.- https://github.com/testcontainers/testcontainers-java
- It is possible to use a single instance of testcontainers/kafka per Maven module, in order to incur the startup/teardown cost only once. The intent is to employ this strategy where feasible.
- Instances of
simplelogger.properties
have been useful during development, but these would be removed before PR merge. - I’ve done a significant amount of runtime testing against containerized Kafka instances using the repo/branch below. This resource may be useful for others who want to do runtime testing without the need for fixed Kafka infrastructure.
- https://github.com/greyp9/kafka-images/tree/NIFI-12194
- There are opportunities to refactor common methods and declarations into the
nifi-kafka-shared
module; I’ve avoided that during development to ease the process of rebasing tonifi/main
. - It is likely that this set of new components will co-exist with the existing Kafka 2.6 based processors for some period of time.
- Allow for migration of existing flows to use the new components.
- Slight behavioral differences might be anticipated during the "burn in" phase of the new components, due to the scope of work.
- Code compatibility with JRE 8 has been targeted, to leave open the option of backporting this work to the support/1.x branch.
- The PR as is should support the following authentication strategies:
PLAINTEXT
,SSL
,SASL_PLAINTEXT
, andSASL_SSL
. Support for additional authentication strategies could be handled via follow on JIRAs. - Support for the
KafkaRecordSink
form factor could be handled via follow on JIRAs. - Support for Kafka "exactly once" semantics could be handled via follow on JIRAs.
- Migration documentation could be handled via follow on JIRAs.
Issue Tracking
- [x] Apache NiFi Jira issue created
Pull Request Tracking
- [x] Pull Request title starts with Apache NiFi Jira issue number, such as
NIFI-00000
- [x] Pull Request commit message starts with Apache NiFi Jira issue number, as such
NIFI-00000
Pull Request Formatting
- [x] Pull Request based on current revision of the
main
branch - [x] Pull Request refers to a feature branch with one commit containing changes
Verification
Please indicate the verification steps performed prior to pull request creation.
Build
- [x] Build completed using
mvn clean install -P contrib-check
- [x] JDK 21
Licensing
- [ ] New dependencies are compatible with the Apache License 2.0 according to the License Policy
- [ ] New dependencies are documented in applicable
LICENSE
andNOTICE
files
Documentation
- [ ] Documentation formatting appears as expected in rendered files