redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

cloud_storage: Improve logging and error handling

Open Lazin opened this issue 9 months ago • 1 comments

Improve logging and error handling in the remote_partition. In the remote partition reader we're checking two abort_source instances. One belongs to the remote_partition instance and another belongs to the sharded_abort_source that belongs to the kafka request ctx. The last abort source is used to propagate exceptions from kafka layer. The abort_source in the remote_partition is not used for this purpose. It's only used to shutdown and therefore its 'check' method only throws 'abort_requested_exception'.

Because of that any kafka disconnection error triggers error level logging in TS. This commit changes behavior a bit. In case if the exception from the kafka layer is triggered by the disconnect the error is converted to simple shutdown error.

Backports Required

  • [ ] none - not a bug fix
  • [ ] none - this is a backport
  • [ ] none - issue does not exist in previous branches
  • [ ] none - papercut/not impactful enough to backport
  • [x] v24.1.x
  • [x] v23.3.x
  • [ ] v23.2.x

Release Notes

  • none

Lazin avatar Apr 29 '24 15:04 Lazin

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48431#018f2ace-10af-4fe4-95d6-09c4c597ff1a

vbotbuildovich avatar Apr 29 '24 18:04 vbotbuildovich

In this case the exception is just rethrown. Feel free to send another PR or push commits to this one of it's needed for release.

On Tue, Apr 30, 2024, 10:24 Andrea Barbadoro @.***> wrote:

@.**** commented on this pull request.

In src/v/cloud_storage/remote_partition.cc https://github.com/redpanda-data/redpanda/pull/18139#discussion_r1584330330 :

  •              auto reason = eptr.has_value()
    
  •                              ? net::is_disconnect_exception(*eptr)
    
  •                              : "shutdown";
    
  •              if (reason) {
    
  •                  vlog(
    
  •                    _ctxlog.debug,
    
  •                    "abort requested via config.abort_source: {}",
    
  •                    reason);
    
  •              } else {
    
  •                  vlog(
    
  •                    _ctxlog.debug,
    
  •                    "abort requested via config.abort_source");
    
  •              }
    
  •              if (_seg_reader) {
    
  •                  _partition->evict_segment_reader(std::move(_seg_reader));
    
  •              }
    

i think we are missing the case where there is an exception but it's not a disconnect_exception.

if we want to increase coverage,

if(!eptr.has_value()){ vlog(_ctxlog.debug, "abort requested via config.abort_source"); } else if(auto maybe_reason= net::is_disconnect_exception(*eptr); maybe_reason.has_value()){ vlog(_ctxlog.debug, "abort requested via config.abort_source: {}", *maybe_reason); } else { vlog(_ctxlog.debug, "abort requested via config.abort_source: {}", *eptr); }

— Reply to this email directly, view it on GitHub https://github.com/redpanda-data/redpanda/pull/18139#pullrequestreview-2030551129, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAWMNRLXAIXAFUEN7DU74TY75ILFAVCNFSM6AAAAABG6RQCCWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMZQGU2TCMJSHE . You are receiving this because you authored the thread.Message ID: @.***>

Lazin avatar Apr 30 '24 08:04 Lazin

/backport v24.1.x

vbotbuildovich avatar May 02 '24 01:05 vbotbuildovich

/backport v23.3.x

vbotbuildovich avatar May 02 '24 01:05 vbotbuildovich