kafka
kafka copied to clipboard
KAFKA-13299: Accept duplicate listener on port for IPv4/IPv6
Loosens the validation so that we accept duplicate listeners on the same port but if and only if the listeners are valid IP addresses with one address being an IPv4 address and the other being an IPv6 address. The detection of whether an address is a valid IP/IPv4/IPv6 address is done using Apache's commons-validator. Outside of this specific case the validations are kept the same albeit error messages have been updated to reflect that there is this new exception case of accepting IPv4/IPv6 on the same port.
This PR tests different corner cases by checking whether the changed listenerListToEndPoints
either throws an exception (and its message or not). The test follows the exact same pattern as the testDuplicateListeners
test. Locally on my machine the PR passes both testIPv4AndIPv6SamePortListeners
(the new test) and testDuplicateListeners
pass but I will wait for CI to finish.
Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [x] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
@mdedetrich Thanks for the PR. As this changes a bit the semantic of the configuration, we might need a small KIP for it. Have you considered doing one?
Sure I will create one tomorrow, I didn't realize that adjusting the behavior while keep the exact same public interface required a KIP.
Yeah... I would say that this one is a bit on the edge. I feel like we need a small KIP because the semantic of the configuration changes with your patch. Thanks.
No worries, will write one today. In regards to the CI tests, in general they seem to be flaky. When I ran :core:integrationTest
locally only the following tests failed and they seem unrelated.
DynamicBrokerReconfigurationTest > testAdvertisedListenerUpdate() FAILED
org.opentest4j.AssertionFailedError: Unexpected exception type thrown ==> expected: <java.util.concurrent.ExecutionException> but was: <java.util.concurrent.TimeoutException>
at app//org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:65)
at app//org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:37)
at app//org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3007)
at app//kafka.server.DynamicBrokerReconfigurationTest.testAdvertisedListenerUpdate(DynamicBrokerReconfigurationTest.scala:990)
Caused by:
java.util.concurrent.TimeoutException: Timeout after waiting for 2000 ms.
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:76)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30)
at kafka.server.DynamicBrokerReconfigurationTest.$anonfun$testAdvertisedListenerUpdate$8(DynamicBrokerReconfigurationTest.scala:990)
at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:55)
... 3 more
DeleteTopicTest > testAddPartitionDuringDeleteTopic() FAILED
kafka.admin.AdminOperationException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/test
at app//kafka.zk.AdminZkClient.writeTopicPartitionAssignment(AdminZkClient.scala:178)
at app//kafka.zk.AdminZkClient.createPartitionsWithAssignment(AdminZkClient.scala:298)
at app//kafka.zk.AdminZkClient.addPartitions(AdminZkClient.scala:231)
at app//kafka.admin.DeleteTopicTest.testAddPartitionDuringDeleteTopic(DeleteTopicTest.scala:290)
On another note Is there a place in this PR/repo where I should update the documentation?
I just updated/rebased the PR, apart from a very minor syntactic fixes we also now output the port
when its context is relevant in the validation to help users. This was mainly done because the additional effort to do this was trivial (since we already went through the process of grouping ports we can say which port specifically failed the validation).
@dajac KIP created at https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=195726330
I have updated the PR to add some upgrade notes to docs/upgrade.html
. I am not sure if additional documentation is needed elsewhere (I had a look at docs
in general and couldn't find anything specific enough but I may have missed something).
I have updated the PR to add documentation to https://kafka.apache.org/documentation/#brokerconfigs_listeners
I have rebased the PR against the latest Kafka trunk
@showuon @tombentley @mimaison Since KIP-797 has the necessary amount of binding votes this PR is now ready to be reviewed.
@jsancio Now that you have created the 3.3.0 release branch do I need to change the base branch of this PR to the new 3.3.0 branch or should it remain on trunk?
@mdedetrich Sorry for the delay. I hope to review your PR in the coming days.
Please keep it again trunk, if we also want it in 3.3, we'll backport it.
No worries, I am just a bit unfamiliar with the process for KIP's. Also letting you know the KIP was accepted into the 3.3 release so I do believe it needs to be backported as well?
@mdedetrich The KIP was accepted in 3.3 but we assumed it would get merged before feature freeze. Feature freeze happened last week (see https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.3.0) so it's now at @jsancio's discretion whether it will be backported to 3.3 or not once this is merged to trunk (we always first merge to trunk).
Since it is fairly small, if we can merge it in the next few days, you may be able to convince José to include it. I'll try taking a look this afternoon or tomorrow.
@mimaison If you approve and merge this soon. I am okay with you cherry picking it to the 3.3 branch.
@tombentley @mimaison Thanks for the review!
I have gone through each of the review comments and submitted the requested changes as their own commit and I also marked them as resolved, only one comment is still outstanding.
In that case, should we do this validation? inetAddressValidator.isValid(ep.host). It looks like we don't care about if the host is valid or not, all we want to filter out, is the null host case (i.e. PLAINTEXT://:9096), is my understanding correct?
@showuon So we still need it, not for seeing whether hosts are invalid or not (since we don't care about hosts) but rather to filter out all of the valid IP addresses. This is answered in more detail at https://github.com/apache/kafka/pull/11478#discussion_r928815976.
As mentioned above if we are on the same page then creating a JIRA issue to do additional checking to see if hosts are valid seems appropriate.
As mentioned above if we are on the same page then creating a JIRA issue to do additional checking to see if hosts are valid seems appropriate.
Yes, please create a JIRA issue. Thanks.
Yes, please create a JIRA issue. Thanks.
JIRA issue created at https://issues.apache.org/jira/browse/KAFKA-14103
@mimaison @tombentley , do you want to have another look?
Thanks @mdedetrich, the current changes look good.
I wonder if we should also add an integration test. We should be able to extend MultipleListenersWithSameSecurityProtocolBaseTest
to test duplicate listeners. There may be other existing tests which would be good candidates for adding this, if you find a better place go for it.
I wonder if we should also add an integration test. We should be able to extend MultipleListenersWithSameSecurityProtocolBaseTest to test duplicate listeners. There may be other existing tests which would be good candidates for adding this, if you find a better place go for it.
I just did a quick glance of MultipleListenersWithSameSecurityProtocolBaseTest
and if I understand correctly this appears to be non trivial to extend for this specific test case due to the fact that currently MultipleListenersWithSameSecurityProtocolBaseTest
configures the endpoints to use localhost:0
where the 0
is a magic constant which the def validate
function deliberately filters out to bypass any kind of validation specifically for the case of this kind of testing (i.e. the reasoning behind the // filter port 0 for unit tests
comment).
Actually following on from my previous comment (assuming its correct), the currently existing MultipleListenersWithSameSecurityProtocolBaseTest
tests are already covering what would be tested as a result of this PR. Since the MultipleListenersWithSameSecurityProtocolBaseTest
effectively disables the listener validation (by filtering out the listeners which contain magic 0
port) it already happens to be testing the case of having multiple brokers with different listener names.
@mdedetrich Sorry for the late reply. I would have liked an integration test that checks 2 listeners, ipv4 and ipv6, can be on the same port as it's the core feature of this KIP. None of the existing integration tests seem to cover that scenario.
@mimaison
Completely agree that an integration test is ideal to specifically test kafka being able to start multiple instances on same machine, one IPv4 and other IPv6, I will start working on this.
I was just slightly confused by the currently existing tests you were recommending to extend, thanks for clarifying that it was in fact testing something else.
@mdedetrich Are you still interested in completing this KIP? IIRC the changes were fine but I was wondering if an integration test would be necessary. @jlprat @C0urante WDYT?
@mimaison Thanks for the reminder!
Yes I am still interested, unfortunately due to the non trivial nature of the type of tests I needed to write the this ended up becoming de-prioritized due to the amount of work necessary on my end (also our internal usecase of this feature also was delayed).
If its still decided we want full integration tests than in around a few weeks I should have time to look into it.
Hi @mimaison! We have a variation (simplification) of this change running on our production for over a year and so far no problems. So I'd be fine with merging the change as it is.
Hi @jlprat, the point of tests is also to ensure future changes don't break this feature. You're more familiar with this feature than me, if you think the unit tests are enough, you can merge the PR.
Hi @mimaison, the current unit tests present in the PR seem that they will cover the case of a potential involuntary regression for this feature. So I'm fine adding the change as it currently stands.
@mdedetrich Could you update the documentation bit? Currently, it is under 3.3
, but it should be moved under a new 3.6 section.