KAFKA-17042: The migration docs should remind users to set "broker.id.generation.enable" when adding broker.id
In the section: Enter Migration Mode on the Brokers
It requires users to add broker.id, but this can produce an error: "broker.id must be greater than or equal to -1 and not greater than reserved.broker.max.id". This error is caused by the ZooKeeper broker using a generated broker ID.
As this phase is temporary, a simple solution is to remind users to add broker.id.generation.enable=false if the ZooKeeper broker is using a generated broker ID.
Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
Hi @showuon and @chia7712
I have add some description for broker.id.generation.enable=false, PTAL 😃
Hi @chia7712 , I have do some changes based on feedback, PTAL 😸
I think the main question I have is, what does this relate to migration? When doing ZK migration, that means, the ZK broker is already up and healthy, before starting migrating to KRaft. And when entering in the Enter Migration Mode on the Brokers section, we ask users to add the following 4 configs:
- controller.quorum.voters
- controller.listener.names
- The controller.listener.name should also be added to listener.security.property.map
- zookeeper.metadata.migration.enable
Thus, I don't think it will cause the error as described in the JIRA, right? I still don't understand in what circumstance this issue will happen. It might need more explanation. Thanks.
Thus, I don't think it will cause the error as described in the JIRA, right? I still don't understand in what circumstance this issue will happen. It might need more explanation. Thanks.
thanks for this response, and it inspires me to dig-in the root cause. The issue happens when the zk broker is using generated broker id and it is in migrating.
- if we define the broker id in config file, the broker id is viewed as invalid since it is in the reserved range (when
broker.id.generation.enable=true) - if we don't define the broker id in config file, the broker id will be evaluated according to
meta.properties[0] . However, we forgot to updatenode.idand so kraft client will use "node.id=-1" [1] to build and it results in error ..
[0] https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/KafkaServer.scala#L261 [1] https://github.com/apache/kafka/blob/e55c28c60b0f021b12c93fa04e360ce7a2e0a5ac/core/src/main/scala/kafka/raft/RaftManager.scala#L230
In short, we don't consider the use case of "generated broker id (the broker id is omitted)" + "zk migration". It seems we need two fixes:
- update docs to explain the possible error when you define the broker id for the zk broker which is using
broker.id.generation.enable - in order to minimize the changes, we can update both
broker.idandnode.idtogether. The side effect isKafkaConfig#nodeIdwill be a mutable variable.
@showuon WDYT?
- update docs to explain the possible error when you define the broker id for the zk broker which is using broker.id.generation.enable
If my understanding is correct, the possible error is the broker id will be ignored and the generated ID might exceed the range of max id, which cause the error reported in JIRA. Is that right?
- in order to minimize the changes, we can update both broker.id and node.id together. The side effect is KafkaConfig#nodeId will be a mutable variable.
IMO, this should be a bug. We should take the generated ID into node.id. WDYT?
@frankvicky , any thoughts about it? I'd also like to hear since you are the author of this PR.
If my understanding is correct, the possible error is the broker id will be ignored and the generated ID might exceed the range of max id, which cause the error reported in JIRA. Is that right?
Yep, ignoring the broker id can pass the reserved ids check but it fails on the default node.id. Explicitly defining the "broker.id=generated id" can update node.id to run Kraft components in zk broker but it fails on reserved ids check.
We should take the generated ID into node.id.
Yes. Specifically, We can define node.id too in updating broker.id manually no matter how we get the broker.id.
@frankvicky , any thoughts about it? I'd also like to hear since you are the author of this PR.
Hi @showuon !
The reason why broker.id.generation.enable might need to be disabled has been more precisely explained by @chia7712 compared to my previous explanation.
Regarding the second point from @chia7712 , I'm thinking the potential drawbacks of allowing node.id to be mutable.
IMHO, could these changes lead to inconsistencies in cluster coordination and communication? Might they also cause any unpredictable behavior?
Yep, ignoring the broker id can pass the reserved ids check but it fails on the default node.id.
Sorry, I don't understand what you mean here. Enabling the broker.id.generation.enable, we'll Ignore the broker ID config. "But it fails on the default node.id" <-- what does it mean?
Explicitly defining the "broker.id=generated id" can update node.id to run Kraft components in zk broker but it fails on reserved ids check.
"but it fails on reserved ids check" <-- what does it mean? If the ID is successfully generated, it should already be within the reserved range, why do we need another check?
Regarding the second point from @chia7712 , I'm thinking the potential drawbacks of allowing node.id to be mutable.
Currently, if node.id is not set, but broker.id is set, we'll set node.id=broker.id. So, we can add, if broker.id.generation.enable is enabled, and broker.id, node.id not set, we set them. That should fix the problem in this JIRA, right?
IMHO, could these changes lead to inconsistencies in cluster coordination and communication? Might they also cause any unpredictable behavior?
Could you explain more about the inconsistencies in cluster coordination and communication situation? I might miss something. Thanks.
I think my main point is, the JIRA said:
it requires users to add "broker.id", but it can produces error "broker.id must be greater than or equal to -1 and not greater than reserved.broker.max.id" too.
That is caused by the zk broker is using a generated broker id.
the root cause of the issue is the generated ID doesn't pass to node.id, which causes this error when startup, is that correct? If so, why cannot we fix the issue by setting node.id = generated ID, if not set? If not, please kindly let me know why this error will appear? Maybe a reproduce step is helpful. Thanks.
But it fails on the default node.id" <-- what does it mean? If the ID is successfully generated, it should already be within the reserved range, why do we need another check?
The story starts with a zk broker having a generated broker id.
Normally, it does not need to define the broker id in the config file. However, that can't work in migration aince Kraft class needs node id and node id is not initialized with (generated) broker id.
If we add the (generated) broker id to config file to initialize both ids, it can't pass the check due to reservation rules.
the root cause of the issue is the generated ID doesn't pass to node.id, which causes this error when startup, is that correct?
You are right and that is the root cause.
? If so, why cannot we fix the issue by setting node.id = generated ID, if not set?
That can fix the issue. The concern is that the node.id will be a mutable variable rather than a value from configs. That opens a dangerous door that enable everyone to change it in everywhere.
Maybe another solution is we can skip the id value check if it is in zk migration. The side effect is that zk broker needs to define broker id in config file for migration even though generated id is enabled.
Could you explain more about the inconsistencies in cluster coordination and communication situation? I might miss something. Thanks.
If I remember correctly, KafkaConfig can be changed externally. If node.id becomes a variable, users might unintentionally or intentionally change node.id, which is clearly not very secure.
Thanks for the response. So I think our problem to solve is, when running migration in Enter Migration Mode on the Brokers step, the ZK node "MUST" set broker.id, to avoid node.id set to -1 and fail to startup. @frankvicky , could you please focus on this issue to improve the doc? Thanks.
Thanks for @showuon to clarify the root cause. That is a great offline discussion 😃
The inconsistency between node.id and broker.id (when generated id is enabled) can be addressed in another issue. I will file it later
Hi @showuon , @chia7712
I have rewrite the description of broker.id to explain why we need this config, PTAL
Hi @showuon I have simplified the description based on feedback, PTAL
The inconsistency between node.id and broker.id (when generated id is enabled) can be addressed in another issue. I will file it later
https://issues.apache.org/jira/browse/KAFKA-17077
Hi @showuon I have adjust the description based on your feedback, PTAL
Hi @chia7712 , @showuon I have added a reminder about potential failures, PTAL 🐧