kafka KAFKA-15776: Use the FETCH request timeout as the delay timeout for DelayedRemoteFetch

DelayedRemoteFetch uses fetch.max.wait.ms config as a delay timeout for DelayedRemoteFetchPurgatory. fetch.max.wait.ms purpose is to wait for the given amount of time when there is no data available to serve the FETCH request.

The maximum amount of time the server will block before answering the fetch request if there isn't sufficient data to immediately satisfy the requirement given by fetch.min.bytes.

Using the same timeout in the DelayedRemoteFetchPurgatory can confuse the user on how to configure optimal value for each purpose. Moreover, the config is of LOW importance and most of the users won't configure it and use the default value of 500 ms.

Having the delay timeout of 500 ms in DelayedRemoteFetchPurgatory can lead to higher number of expired delayed remote fetch requests when the remote storage have any degradation.

Test: Existing unit and integration tests

Committer Checklist (excluded from commit message)

[ ] Verify design and implementation
[ ] Verify test coverage and CI build status
[ ] Verify documentation (including upgrade notes)

Nov 16 '23 10:11 kamalcph

Could you please help me understand how this change works with fetch.max.wait.ms from a user perspective i.e. what happens when we are retrieving data from both local & remote in a single fetch call?

Also, wouldn't this change user clients? Asking because prior to this change users were expecting a guaranteed response within fetch.max.wait.ms = 500ms but now they might not receive a response until 40s request.timeout.ms. If the user has configured their application timeouts to according to fetch.max.wait.ms, this change will break my application.

Nov 18 '23 18:11 divijvaidya

Could you please help me understand how this change works with fetch.max.wait.ms from a user perspective i.e. what happens when we are retrieving data from both local & remote in a single fetch call?

fetch.max.wait.ms timeout is applicable only when there is no enough data (fetch.min.bytes) to respond back to the client. This is a special case where we are reading the data from both local and remote, the FETCH request has to wait for the tail latency which is a combined latency of reading from both local and remote storage.

Note that we always read from only one remote partition up-to max.partition.fetch.bytes even-though there is bandwidth available in the FETCH response (fetch.max.bytes) and the client rotates the partition order in the next FETCH request so that next partitions are served.

Also, wouldn't this change user clients? Asking because prior to this change users were expecting a guaranteed response within fetch.max.wait.ms = 500ms but now they might not receive a response until 40s request.timeout.ms. If the user has configured their application timeouts to according to fetch.max.wait.ms, this change will break my application.

fetch.max.wait.ms doesn't guarantee a response within this timeout. The client expires the request only when it exceeds the request.timeout.ms of 30 seconds (default). The time taken to serve the FETCH request can be higher than the fetch.max.wait.ms due to slow hard-disk, sector errors in disk and so on.

The FetchRequest.json doesn't expose the client configured request timeout, so we are using the default server request timeout of 30 seconds. Otherwise, we can introduce one more config fetch.remote.max.wait.ms to define the delay timeout for DelayedRemoteFetch requests. We need to decide whether to keep this config in the client/server since the server operator may need to tune this config for all the clients if the remote storage degrades and latency to serve the remote FETCH requests is high.

Nov 20 '23 06:11 kamalcph

I understand the problem you're trying to solve, but using the server default request timeout doesn't make sense to me. It will break the contract of fetch protocol that fetch.max.wait.ms will not be exceeded if no sufficient data in "local" log. I understand the remote read is some kind of grey area about if "data is existed or not", but we have to admit, some users might feel surprised when their fetch doesn't respond in fetch.max.wait.ms time. Ideally, we should introduce another config for this remote read waiting purpose, instead of re-using request timeout.

Nov 21 '23 09:11 showuon

This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch)

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

Feb 20 '24 03:02 github-actions[bot]

Correction: For this:

some users might feel surprised when their fetch doesn't respond in fetch.max.wait.ms time.

This is wrong. Even if the remote reading is not completed, yet, the fetch request will still return in fetch.max.wait.ms. It's just an empty response.

Apr 12 '24 10:04 showuon

@showuon @clolov @jeqo

The patch is ready for review. PTAL.

Will open a separate PR to emit the RemoteLogReader FetchRateAndTimeMs metric.

Jun 04 '24 10:06 kamalcph

@kamalcph , could we move the dynamic config change into another PR? I have some comments to it, but that is separate from the original changes.

Jun 05 '24 00:06 showuon

Failed tests are unrelated.

Jun 05 '24 06:06 showuon

kafka kafka copied to clipboard

KAFKA-15776: Use the FETCH request timeout as the delay timeout for DelayedRemoteFetch

Committer Checklist (excluded from commit message)

kafka
kafka copied to clipboard