strimzi-kafka-operator
strimzi-kafka-operator copied to clipboard
[Enhancement]: Enrich KafkaRebalance status with progress info
Related problem
Today a KafkaRebalance object has only a Rebalancing status which means the rebalance is ongoing.
It would be nice to have progress info in the KafkaRebalance status.
Suggested solution
During KafkaRebalance reconcile, when UserTask status is IN_EXECUTION, we could call kafkacruisecontrol/state?substates=EXECUTOR&verbose=true&json=true in order to get info on the ongoing rebalancing (executor_state_example.json).
This way we would be able to enrich KafkaRebalance status with progress info like:
- number of partitions done vs. total + percentage
- number of MB moved vs. total + percentage
Alternatives
No response
Additional context
No response
While I think this is a good idea, at the same time I was wondering which info would be really useful but mainly how much big the verbose response could taking into account a big cluster with hundreds of topics and corresponding partitions.
From the example JSON you shared I could see useful totalDataToMove together with finishedDataMovement, numTotalPartitionMovements together with numFinishedPartitionMovements.
But then you also have more details for specific topic partitions. From your example I see just one topic (it's mytopic) and I was wondering how much this JSON can grow with tons of topics? Maybe it's just possible to avoid the verbose=true to get what we really need? Did you try it?
Discussed on the community call on 10.7.2024: This should be discussed on the next call with more Cruise Control SMEs on the call.
Not able to join today's call but I am waiting for @Vince-Chenal to think about my last questions/thoughts.
Hey @ppatierno, sorry I completely missed the notification.
I just ran another test and you're right without verbose option we still have enough information to show rebalance progress:
{
"ExecutorState": {
"finishedDataMovement": 0,
"numInProgressPartitionMovements": 0,
"triggeredTaskReason": "No reason provided (Client: 10.66.25.180, Date: 2024-08-09T08:00:32Z)",
"abortingPartitions": 0,
"triggeredSelfHealingTaskId": "",
"numFinishedPartitionMovements": 8,
"maximumConcurrentPartitionMovementsPerBroker": 1,
"numPendingPartitionMovements": 86,
"triggeredUserTaskId": "9eefaa9f-4c2a-47fb-9fcb-95c3d350c45a",
"totalDataToMove": 91212,
"minimumConcurrentPartitionMovementsPerBroker": 1,
"state": "INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS",
"averageConcurrentPartitionMovementsPerBroker": 1.0,
"numTotalPartitionMovements": 94
},
"version": 1
}
verbose adds details for every topic partitions in completedPartitionMovement/pendingPartitionMovement fields which we don't need :+1:
Thanks @Vince-Chenal, at this point I think it's something we can add. Are you willing to contribute to it?
@ppatierno I do not think it is that simple. You would also need to create a mechanism to avoid the reconciliation running in a tight loop because of this. So understanding the information that might be interesting is just one part. But it still needs to be triaged for the technical part and likely have a proposal.
I agree. I was asking because, it's not something as high priority imho. So it could not have an ETA or not being planned soon. So if anyone from the community would help, it's much better.
I have the impression that this getCruiseControlState function is never called anywhere
Do you know where it was called before and why it's been removed? We could have reuse this existing code
Discussed on a community call on 22.8.2024: This would be a good feature. But it should have a proposal to cover:
- How will the API look like
- How it will be implemented to avoid tight reconciliation loops
Completed by https://github.com/strimzi/strimzi-kafka-operator/pull/11348