KAFKA-5859: Avoid retaining AbstractRequest in RequestChannel.Response
This PR removes the need to keep a reference to the parsed AbstractRequest after it's been handled in KafkaApis. A reference to RequestAndSize which holds the parsed AbstractRequest in RequestChannel.Request was kept in scope as a consequence of being passed into the RequestChannel.Response after being handled.
The Jira ticket KAFKA-5859 suggests removing this reference as soon as it's no longer needed. I considered several implementations and I settled on creating a new type that contains all the relevant information of the Request that is required after it has been handled. I think this approach allows for the least amount of invasive changes in the Request/Response lifecycle while retaining the immutability of the RequestChannel.Request.
A new type called RequestChannel.RequestSummary now contains much of the information that was in RequestChannel.Request before. The RequestChannel.Request now generates a RequestChannel.RequestSummary that is passed into the RequestChannel.Response after being handled in KafkaApis. RequestChannel.RequestSummary contains information such as:
- A detailed and non-detailed description of the request
- Metrics associated with the request
- Helper methods to update various Request metrics
- A special case describing whether or not the original Request was a
FetchRequestand whether it was from a follower. This information is required in theupdateRequestMetricsmetrics helper method.
This change does not make any behaviour changes so no additional tests were added. I've verified that all unit and integration tests pass and no regressions were introduced. I'm interested in seeing the before and after results of the Confluent Kafka system tests as described in step 11 of the Contributing Code Changes section. I would like to request access to kick off this system tests suite if you agree that it's relevant to this ticket.
This is my first contribution to this project. I picked up this issue because it was marked with the newbie flag and it seemed like a good opportunity to learn more about about the request and response lifecycle in the Kafka broker.
Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
FAILURE 8067 tests run, 5 skipped, 2 failed. --none--
FAILURE 7959 tests run, 5 skipped, 1 failed. --none--
SUCCESS 8067 tests run, 5 skipped, 0 failed. --none--
I ran the full testsuite locally and successfully (JDK 8, Scala 2.12), but I see it failed in this PR's build results. I observe that other recent PR's have similar transient test failures. Is there anything I need to do to resolve this?
Highlighting @ijuma since he was the original reporter of 5859. What do you think? May I gain access to see the Confluent System Tests to see how they performed with this change?
SUCCESS 8071 tests run, 5 skipped, 0 failed. --none--
SUCCESS 8071 tests run, 5 skipped, 0 failed. --none--
SUCCESS 8071 tests run, 5 skipped, 0 failed. --none--
FAILURE 7973 tests run, 5 skipped, 1 failed. --none--
FAILURE 7973 tests run, 5 skipped, 1 failed. --none--
FAILURE 7973 tests run, 5 skipped, 1 failed. --none--
FAILURE 7973 tests run, 5 skipped, 1 failed. --none--
SUCCESS 8081 tests run, 5 skipped, 0 failed. --none--
SUCCESS 8081 tests run, 5 skipped, 0 failed. --none--
I rebased from trunk last night. I ran the testsuite a few times locally and noticed transient errors with some of the integration tests, just like those reported by the build server. I'm not sure if there's anything that can be done or if they're at all related to this PR, but if there is something I can do then please let me know and I'll try to address it.
This PR is ready for a round 2 review.
Is the current state satisfactory to you @ijuma ? Let me know if there's any additional work you had in mind.
Hey @ijuma ,
I'm now creating the summary value lazily. When log tracing is not enabled the earliest reference that will cause it to evaluate is when the requestDequeueTimeNanos timing var is set right before the request is handled by KafkaApis:
https://github.com/seglo/kafka/blob/to-request-summary-5859/core/src/main/scala/kafka/server/KafkaRequestHandler.scala#L63
The timing variables used for metrics could be factored out into their own datastructure available within the request and the summary. Doing this would allow us to be even lazier about when to create the summary (i.e. exactly when the response is created). Do you think it's worth doing this?
I also modified the summary to take one description field. It predetermines whether to generate a detailed or non-detailed description based on whether tracing is enabled. This removes the need for a method that serves one or the other because all the usages of description I could find are used in trace log lines or explicitly check whether tracing is enabled to decide.
Hi @ijuma ,
I choose to always force evaluation of bodyAndSize in the Request.releaseBuffer method when the buffer is not null. You can see my comment from earlier this week about why this is necessary. All tests are passing and I think we're ready for another review.
While thinking about this lazy evaluation this week I think there's another potential optimization story that could be done. It may be possible never to evaluate the request at all while a quota is being enforced. However, not parsing the request will have cascading changes with how metrics are reported and requests logged, so I thought it prudent not to try and confuse the objective of this case with that. If you think this is worth exploring I could create a Jira ticket and look into it.
@ijuma WDYT? Thanks for your time to date reviewing this PR. It's appreciated.
I've rebased on trunk and squashed this branch.
This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please leave a comment asking for a review. If the PR has merge conflicts, update it with the latest from the base branch.
If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact).
If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.
This PR has been closed since it has not had any activity in 120 days. If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open the PR and ask for a review.