kafka icon indicating copy to clipboard operation
kafka copied to clipboard

KAFKA-13299: Accept duplicate listener on port for IPv4/IPv6

Open mdedetrich opened this issue 3 years ago • 25 comments

Loosens the validation so that we accept duplicate listeners on the same port but if and only if the listeners are valid IP addresses with one address being an IPv4 address and the other being an IPv6 address. The detection of whether an address is a valid IP/IPv4/IPv6 address is done using Apache's commons-validator. Outside of this specific case the validations are kept the same albeit error messages have been updated to reflect that there is this new exception case of accepting IPv4/IPv6 on the same port.

This PR tests different corner cases by checking whether the changed listenerListToEndPoints either throws an exception (and its message or not). The test follows the exact same pattern as the testDuplicateListeners test. Locally on my machine the PR passes both testIPv4AndIPv6SamePortListeners (the new test) and testDuplicateListeners pass but I will wait for CI to finish.

Committer Checklist (excluded from commit message)

  • [ ] Verify design and implementation
  • [x] Verify test coverage and CI build status
  • [ ] Verify documentation (including upgrade notes)

mdedetrich avatar Nov 09 '21 13:11 mdedetrich

@mdedetrich Thanks for the PR. As this changes a bit the semantic of the configuration, we might need a small KIP for it. Have you considered doing one?

dajac avatar Nov 09 '21 18:11 dajac

Sure I will create one tomorrow, I didn't realize that adjusting the behavior while keep the exact same public interface required a KIP.

mdedetrich avatar Nov 10 '21 03:11 mdedetrich

Yeah... I would say that this one is a bit on the edge. I feel like we need a small KIP because the semantic of the configuration changes with your patch. Thanks.

dajac avatar Nov 10 '21 07:11 dajac

No worries, will write one today. In regards to the CI tests, in general they seem to be flaky. When I ran :core:integrationTest locally only the following tests failed and they seem unrelated.

DynamicBrokerReconfigurationTest > testAdvertisedListenerUpdate() FAILED
    org.opentest4j.AssertionFailedError: Unexpected exception type thrown ==> expected: <java.util.concurrent.ExecutionException> but was: <java.util.concurrent.TimeoutException>
        at app//org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:65)
        at app//org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:37)
        at app//org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3007)
        at app//kafka.server.DynamicBrokerReconfigurationTest.testAdvertisedListenerUpdate(DynamicBrokerReconfigurationTest.scala:990)

        Caused by:
        java.util.concurrent.TimeoutException: Timeout after waiting for 2000 ms.
            at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:76)
            at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30)
            at kafka.server.DynamicBrokerReconfigurationTest.$anonfun$testAdvertisedListenerUpdate$8(DynamicBrokerReconfigurationTest.scala:990)
            at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:55)
            ... 3 more

DeleteTopicTest > testAddPartitionDuringDeleteTopic() FAILED
    kafka.admin.AdminOperationException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/test
        at app//kafka.zk.AdminZkClient.writeTopicPartitionAssignment(AdminZkClient.scala:178)
        at app//kafka.zk.AdminZkClient.createPartitionsWithAssignment(AdminZkClient.scala:298)
        at app//kafka.zk.AdminZkClient.addPartitions(AdminZkClient.scala:231)
        at app//kafka.admin.DeleteTopicTest.testAddPartitionDuringDeleteTopic(DeleteTopicTest.scala:290)

On another note Is there a place in this PR/repo where I should update the documentation?

mdedetrich avatar Nov 10 '21 10:11 mdedetrich

I just updated/rebased the PR, apart from a very minor syntactic fixes we also now output the port when its context is relevant in the validation to help users. This was mainly done because the additional effort to do this was trivial (since we already went through the process of grouping ports we can say which port specifically failed the validation).

mdedetrich avatar Nov 10 '21 11:11 mdedetrich

@dajac KIP created at https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=195726330

mdedetrich avatar Nov 10 '21 16:11 mdedetrich

I have updated the PR to add some upgrade notes to docs/upgrade.html. I am not sure if additional documentation is needed elsewhere (I had a look at docs in general and couldn't find anything specific enough but I may have missed something).

mdedetrich avatar Nov 18 '21 11:11 mdedetrich

I have updated the PR to add documentation to https://kafka.apache.org/documentation/#brokerconfigs_listeners

mdedetrich avatar Nov 18 '21 12:11 mdedetrich

I have rebased the PR against the latest Kafka trunk

mdedetrich avatar Feb 11 '22 00:02 mdedetrich

@showuon @tombentley @mimaison Since KIP-797 has the necessary amount of binding votes this PR is now ready to be reviewed.

mdedetrich avatar Jun 17 '22 09:06 mdedetrich

@jsancio Now that you have created the 3.3.0 release branch do I need to change the base branch of this PR to the new 3.3.0 branch or should it remain on trunk?

mdedetrich avatar Jul 11 '22 18:07 mdedetrich

@mdedetrich Sorry for the delay. I hope to review your PR in the coming days.

Please keep it again trunk, if we also want it in 3.3, we'll backport it.

mimaison avatar Jul 11 '22 19:07 mimaison

No worries, I am just a bit unfamiliar with the process for KIP's. Also letting you know the KIP was accepted into the 3.3 release so I do believe it needs to be backported as well?

mdedetrich avatar Jul 11 '22 22:07 mdedetrich

@mdedetrich The KIP was accepted in 3.3 but we assumed it would get merged before feature freeze. Feature freeze happened last week (see https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.3.0) so it's now at @jsancio's discretion whether it will be backported to 3.3 or not once this is merged to trunk (we always first merge to trunk).

Since it is fairly small, if we can merge it in the next few days, you may be able to convince José to include it. I'll try taking a look this afternoon or tomorrow.

mimaison avatar Jul 12 '22 10:07 mimaison

@mimaison If you approve and merge this soon. I am okay with you cherry picking it to the 3.3 branch.

jsancio avatar Jul 13 '22 22:07 jsancio

@tombentley @mimaison Thanks for the review!

I have gone through each of the review comments and submitted the requested changes as their own commit and I also marked them as resolved, only one comment is still outstanding.

mdedetrich avatar Jul 18 '22 13:07 mdedetrich

In that case, should we do this validation? inetAddressValidator.isValid(ep.host). It looks like we don't care about if the host is valid or not, all we want to filter out, is the null host case (i.e. PLAINTEXT://:9096), is my understanding correct?

@showuon So we still need it, not for seeing whether hosts are invalid or not (since we don't care about hosts) but rather to filter out all of the valid IP addresses. This is answered in more detail at https://github.com/apache/kafka/pull/11478#discussion_r928815976.

As mentioned above if we are on the same page then creating a JIRA issue to do additional checking to see if hosts are valid seems appropriate.

mdedetrich avatar Jul 25 '22 12:07 mdedetrich

As mentioned above if we are on the same page then creating a JIRA issue to do additional checking to see if hosts are valid seems appropriate.

Yes, please create a JIRA issue. Thanks.

showuon avatar Jul 25 '22 13:07 showuon

Yes, please create a JIRA issue. Thanks.

JIRA issue created at https://issues.apache.org/jira/browse/KAFKA-14103

mdedetrich avatar Jul 25 '22 14:07 mdedetrich

@mimaison @tombentley , do you want to have another look?

showuon avatar Jul 26 '22 01:07 showuon

Thanks @mdedetrich, the current changes look good.

I wonder if we should also add an integration test. We should be able to extend MultipleListenersWithSameSecurityProtocolBaseTest to test duplicate listeners. There may be other existing tests which would be good candidates for adding this, if you find a better place go for it.

mimaison avatar Aug 01 '22 17:08 mimaison

I wonder if we should also add an integration test. We should be able to extend MultipleListenersWithSameSecurityProtocolBaseTest to test duplicate listeners. There may be other existing tests which would be good candidates for adding this, if you find a better place go for it.

I just did a quick glance of MultipleListenersWithSameSecurityProtocolBaseTest and if I understand correctly this appears to be non trivial to extend for this specific test case due to the fact that currently MultipleListenersWithSameSecurityProtocolBaseTest configures the endpoints to use localhost:0 where the 0 is a magic constant which the def validate function deliberately filters out to bypass any kind of validation specifically for the case of this kind of testing (i.e. the reasoning behind the // filter port 0 for unit tests comment).

mdedetrich avatar Aug 01 '22 20:08 mdedetrich

Actually following on from my previous comment (assuming its correct), the currently existing MultipleListenersWithSameSecurityProtocolBaseTest tests are already covering what would be tested as a result of this PR. Since the MultipleListenersWithSameSecurityProtocolBaseTest effectively disables the listener validation (by filtering out the listeners which contain magic 0 port) it already happens to be testing the case of having multiple brokers with different listener names.

mdedetrich avatar Aug 02 '22 07:08 mdedetrich

@mdedetrich Sorry for the late reply. I would have liked an integration test that checks 2 listeners, ipv4 and ipv6, can be on the same port as it's the core feature of this KIP. None of the existing integration tests seem to cover that scenario.

mimaison avatar Aug 16 '22 18:08 mimaison

@mimaison

Completely agree that an integration test is ideal to specifically test kafka being able to start multiple instances on same machine, one IPv4 and other IPv6, I will start working on this.

I was just slightly confused by the currently existing tests you were recommending to extend, thanks for clarifying that it was in fact testing something else.

mdedetrich avatar Aug 17 '22 07:08 mdedetrich

@mdedetrich Are you still interested in completing this KIP? IIRC the changes were fine but I was wondering if an integration test would be necessary. @jlprat @C0urante WDYT?

mimaison avatar Apr 05 '23 16:04 mimaison

@mimaison Thanks for the reminder!

Yes I am still interested, unfortunately due to the non trivial nature of the type of tests I needed to write the this ended up becoming de-prioritized due to the amount of work necessary on my end (also our internal usecase of this feature also was delayed).

If its still decided we want full integration tests than in around a few weeks I should have time to look into it.

mdedetrich avatar Apr 05 '23 16:04 mdedetrich

Hi @mimaison! We have a variation (simplification) of this change running on our production for over a year and so far no problems. So I'd be fine with merging the change as it is.

jlprat avatar Apr 18 '23 09:04 jlprat

Hi @jlprat, the point of tests is also to ensure future changes don't break this feature. You're more familiar with this feature than me, if you think the unit tests are enough, you can merge the PR.

mimaison avatar Apr 18 '23 13:04 mimaison

Hi @mimaison, the current unit tests present in the PR seem that they will cover the case of a potential involuntary regression for this feature. So I'm fine adding the change as it currently stands.

@mdedetrich Could you update the documentation bit? Currently, it is under 3.3, but it should be moved under a new 3.6 section.

jlprat avatar Apr 19 '23 06:04 jlprat