nifi icon indicating copy to clipboard operation
nifi copied to clipboard

NIFI-11259 - Kafka processor refactor

Open greyp9 opened this issue 11 months ago • 0 comments

Background

Previous iterations of support for Kafka client versions in NiFi (1.0, 2.0, 2.6) duplicated code from existing Kafka processors into new Maven modules, adjusted Kafka client library dependencies for the new modules, and adjusted for API differences as needed. The original JIRA associated with NiFi support for Kafka 3 (NIFI-9330), followed this same approach. After discussion, a new approach was chosen, and a new JIRA (NIFI-11259) created.

  • Refactor Kafka client library dependencies into a new controller service.
  • Expose a service API that was agnostic of any particular Kafka version.
  • Create new processor implementations that interacted with Kafka through the service API.

In particular, the new processor module should have no Kafka dependencies. This is expected to ease the transition to future Kafka versions.

  • A new 3.0++ controller service might be created to isolate any major API changes to the Kafka client APIs.
  • The new PublishKafka and ConsumeKafka processors would need minimal / no changes to enable interactivity with the new controller service.

Other refactoring activities have been undertaken at the same time:

  • The new PublishKafka processor is intended as a replacement for the existing PublishKafka_2_6 and PublishKafkaRecord_2_6 processor pair. It provides FlowFile-based or record-based data handling modes, controlled via a per-processor property. Similarly, ConsumeKafka replaces both ConsumeKafka_2_6 and ConsumeKafkaRecord_2_6. This design adjustment reduces the code duplication that was present in the 2.6 processor set.

New Modules

  • nifi-kafka-service-api - API contract for KafkaConnectionService, which exposes access to instances of KafkaConsumerService and KafkaProducerService in a manner agnostic to a particular version of Kafka

    • KafkaProducerService - intermediary logical service brokering interactions with the producer APIs of the Kafka client libraries
    • KafkaConsumerService - intermediary logical service brokering interactions with the producer APIs of the Kafka client libraries
  • nifi-kafka-service-api-nar - NiFi NAR wrapper for the KafkaConnectionService API contract

  • nifi-kafka-3-service - home for Kafka3ConnectionService, which abstracts Kafka dependencies away from the new Kafka processors, and manages runtime connections to a remote Kafka 3 service instance

  • nifi-kafka-3-service-nar - NiFi NAR wrapper for Kafka3ConnectionService

  • nifi-kafka-processors - home for PublishKafka and ConsumeKafka processors, which allow interactivity with remote Kafka service instances agnostic to a particular Kafka version

  • nifi-kafka-nar - NiFi NAR wrapper for the PublishKafka and ConsumeKafka processors

  • nifi-kafka-2-6-integration - test bed to establish runtime behavior (testcontainers/kafka) of Kafka 2.6 processors for certain conditions, in order to better replicate those behaviors

  • nifi-kafka-3-integration - testing infrastructure to test new processors / controller service while communicating with an actual (testcontainers) Kafka instance

  • nifi-kafka-jacoco - module home for configuration to instrument executions of nifi-kafka-bundle unit tests / integration tests, in order to assess test coverage

Notes

  • Integration tests are employed as a "first-class citizen" means of testing expected interactions with Kafka instances, running in testcontainers.
    • https://github.com/testcontainers/testcontainers-java
  • It is possible to use a single instance of testcontainers/kafka per Maven module, in order to incur the startup/teardown cost only once. The intent is to employ this strategy where feasible.
  • Instances of simplelogger.properties have been useful during development, but these would be removed before PR merge.
  • I’ve done a significant amount of runtime testing against containerized Kafka instances using the repo/branch below. This resource may be useful for others who want to do runtime testing without the need for fixed Kafka infrastructure.
    • https://github.com/greyp9/kafka-images/tree/NIFI-12194
  • There are opportunities to refactor common methods and declarations into the nifi-kafka-shared module; I’ve avoided that during development to ease the process of rebasing to nifi/main.
  • It is likely that this set of new components will co-exist with the existing Kafka 2.6 based processors for some period of time.
    • Allow for migration of existing flows to use the new components.
    • Slight behavioral differences might be anticipated during the "burn in" phase of the new components, due to the scope of work.
  • Code compatibility with JRE 8 has been targeted, to leave open the option of backporting this work to the support/1.x branch.
  • The PR as is should support the following authentication strategies: PLAINTEXT, SSL, SASL_PLAINTEXT, and SASL_SSL. Support for additional authentication strategies could be handled via follow on JIRAs.
  • Support for the KafkaRecordSink form factor could be handled via follow on JIRAs.
  • Support for Kafka "exactly once" semantics could be handled via follow on JIRAs.
  • Migration documentation could be handled via follow on JIRAs.

Issue Tracking

Pull Request Tracking

  • [x] Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • [x] Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • [x] Pull Request based on current revision of the main branch
  • [x] Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • [x] Build completed using mvn clean install -P contrib-check
    • [x] JDK 21

Licensing

  • [ ] New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • [ ] New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • [ ] Documentation formatting appears as expected in rendered files

greyp9 avatar Mar 01 '24 20:03 greyp9