Brighter icon indicating copy to clipboard operation
Brighter copied to clipboard

Unhandled exceptions are not routed to DLQ

Open jakoss opened this issue 5 months ago • 18 comments

Describe the bug

I'm going through the docs here: https://brightercommand.gitbook.io/paramore-brighter-documentation/brighter-request-handlers-and-middleware-pipelines/handlerfailure#terminate-processing-of-that-request

If i understand correctly, when i throw an exception like this:

public class TestEventMessageHandler(ILogger<TestEventMessageHandler> logger) : RequestHandlerAsync<TestEvent>
{
    public override Task<TestEvent> HandleAsync(TestEvent message, CancellationToken cancellationToken = default)
    {
        logger.LogInformation("Test event ({Id}) received with name: {Name}", message.Id, message.Name);
        throw new InvalidOperationException("test");
        return await base.HandleAsync(message, cancellationToken).ConfigureAwait(ContinueOnCapturedContext);
    }
}

The message should be rejected and passed to DLQ. But this is not what i see in the output. Logs are as follows:

info: Paramore.Brighter.MessagingGateway.RMQ.RmqMessageConsumer[0]
      RmqMessageConsumer: Received message from queue TestTopic with routing key TestTopic via exchange TestExternalEventsExchange on subscription amqp://*****@localhost:9994/, message: {"header":{"timeStamp":"2025-07-15T09:25:44Z","id":"d0c13acd-67fb-41e7-afb5-12be5e71aaac","topic":"TestTopic","messageType":"MT_EVENT","bag":{"deliveryTag":1,"redelivered":true},"handledCount":0,"delayedMilliseconds":0,"correlationId":"00000000-0000-0000-0000-000000000000","contentType":"text/plain","replyTo":null,"partitionKey":"","telemetry":null},"body":{"bytes":"eyAibmFtZSIgOiAidGVzdCIgfQ==","contentType":null,"characterEncoding":"UTF8","value":"{ \u0022name\u0022 : \u0022test\u0022 }"},"id":"d0c13acd-67fb-41e7-afb5-12be5e71aaac","deliveryTag":1,"redelivered":true,"persist":true}
info: Paramore.Brighter.CommandProcessor[0]
      Building send async pipeline for event: Test.FacadePlayground.Events.TestEvent 6d1f2d57-49bb-4f12-8842-15f59570b975
info: Paramore.Brighter.CommandProcessor[0]
      Found 1 async pipelines for event: Test.FacadePlayground.Events.TestEvent 6d1f2d57-49bb-4f12-8842-15f59570b975
info: Test.FacadePlayground.Events.TestEventMessageHandler[0]
      Test event (6d1f2d57-49bb-4f12-8842-15f59570b975) received with name: test
fail: Paramore.Brighter.ServiceActivator.MessagePump[0]
      MessagePump: Failed to dispatch message d0c13acd-67fb-41e7-afb5-12be5e71aaac from TestTopic on thread # 19
      System.InvalidOperationException: test
         at Test.FacadePlayground.Events.TestEventMessageHandler.HandleAsync(TestEvent message, CancellationToken cancellationToken) in /Users/jakubsyty/Projects/Test/Test.Backend.Framework/Tests/Test.FacadePlayground/Events/TestEvent.cs:line 16
         at Test.Framework.Messaging.Abstraction.MessageHandler`1.HandleAsync(TMessage command, CancellationToken cancellationToken) in /Users/jakubsyty/Projects/Test/Test.Backend.Framework/Test.Framework.Messaging/Abstraction/MessageHandler.cs:line 12
         at Paramore.Brighter.CommandProcessor.PublishAsync[T](T event, Boolean continueOnCapturedContext, CancellationToken cancellationToken) in /_/src/Paramore.Brighter/CommandProcessor.cs:line 457
info: Paramore.Brighter.MessagingGateway.RMQ.RmqMessageConsumer[0]
      RmqMessageConsumer: Acknowledging message d0c13acd-67fb-41e7-afb5-12be5e71aaac as completed with delivery tag 1

So i dove in to MessagePump code here: https://github.com/BrighterCommand/Brighter/blob/release/9X/src/Paramore.Brighter.ServiceActivator/MessagePump.cs

And the only RejectMessage invocation i can see there is for ConfigurationException. Am i missing something? Should i look at some other place? I want to trace the message coming into my system and why it is not redirected to DLQ. The DLQ settings for exchange and subscriptions are properly set up.

To Reproduce

For now please give me some guidance on where can i trace the message, if i won't be able to track it myself - i'll try to create some repro project.

Further technical details

  • Brighter version - 9.9.10
  • Include the output of dotnet --info
.NET SDK:
 Version:           8.0.411
 Commit:            f97ff31961
 Workload version:  8.0.400-manifests.2ba2f75d
 MSBuild version:   17.11.31+933b72e36

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  15.5
 OS Platform: Darwin
 RID:         osx-arm64
 Base Path:   /usr/local/share/dotnet/sdk/8.0.411/

.NET workloads installed:
Configured to use loose manifests when installing new manifests.
There are no installed workloads to display.

Host:
  Version:      8.0.17
  Architecture: arm64
  Commit:       77545d6fd5

.NET SDKs installed:
  8.0.411 [/usr/local/share/dotnet/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 8.0.17 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 8.0.17 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]

Other architectures found:
  None

Environment variables:
  DOTNET_ROOT       [/usr/local/share/dotnet]

global.json file:
  Not found

Learn more:
  https://aka.ms/dotnet/info

Download .NET:
  https://aka.ms/dotnet/download
  • The OS (Window, Linux, MacOs, etc) - MacOs

jakoss avatar Jul 15 '25 09:07 jakoss

To not acknowledge the message , you should use throw DeferAction, so Brighter will requeue this message.

Maybe we should invert it, have a special throw to reject a message.

Thinking better, I guess the best would be to have a new property in Subscription for what to do in case of exception, reject or requeue, it could be a function

lillo42 avatar Jul 17 '25 17:07 lillo42

I think the third option is the best, just leave it to user. But either way - current documentation is misleading. It clearly states that unhandled exception goes into DLQ. And i think this is expected behavior for any person working on RabbitMQ, DLQ is there so nothing gets lost

jakoss avatar Jul 18 '25 08:07 jakoss

For AWS SNS/SQS if you have a configure DLQ we are requeuing it, we should do the same for RabbitMQ

lillo42 avatar Jul 18 '25 08:07 lillo42

Oh, ok, i just looked at MessagePump so i thought the logic was provider independent

jakoss avatar Jul 18 '25 08:07 jakoss

Maybe it should be provider independent, at some point somebody change it on provider level instead of the in the MessagePump I think on V11 we can change this logic (V10 is almost out I don't want to delay it more)

lillo42 avatar Jul 18 '25 08:07 lillo42

I think it's fine to not touch it until next versions, but the documentation probably should be updated for now. Because if i didn't check if DLQ is working - this could lead to serious problems at some point :)

jakoss avatar Jul 18 '25 08:07 jakoss

So we don't react to application errors from a handler by pushing the message into an Error Queue or DLQ. I understand some other frameworks choose to do that, but it's not our approach. An error within a handler has one of two possible outcomes:

  • This error is not transient. You should log this error, and we will ack the message, as it cannot be processed, and the either the queue will delete it or the stream offset will advance past it.
  • This error is likely transient. You should throw DeferAction and we will retry. If we have retried at the limit of retries, it will be forwarded to the DLQ as a possible poison pill i.e. it does not seem to be working despite retries.

Instead of an error queue, I recommend that if you need to resend the message in the future you clear the dispatched timestamp on the outbox of the service that sent this message.

iancooper avatar Jul 18 '25 10:07 iancooper

At some points frameworks have opinions. We follow folks like Gregor Hohpe here in not believing in generic "error queues."

iancooper avatar Jul 18 '25 10:07 iancooper

However, @jakoss you are right, this documentation is confusing:

`On and Exernal Bus if your middleware supports a Dead Letter Queue (DLQ), and it is configured in your subscription, when we reject a message it will be copied to the DLQ.

On an External Bus, to prevent discarding too many messages, you can set an Subscription.UnacceptableMessageLimit. If the number of messages terminated due to unhandled exceptions equals or exceeds this limit, the message pump processing the External Bus will terminate.`

As it misses a couple of paragraphs that explain the issue with how you handle transient and not transient and we should fix the docs

iancooper avatar Jul 18 '25 10:07 iancooper

I've tagged this as a documentation issue

iancooper avatar Jul 18 '25 10:07 iancooper

That's all perfectly fine, especially when we can handle this ourselves by pipelines. Thanks

jakoss avatar Jul 18 '25 10:07 jakoss

That's all perfectly fine, especially when we can handle this ourselves by pipelines. Thanks

Yeah, folks who want this can use our fallback handler to post to an error queue. I mean if you want to contribute middleware to do that, I am not adverse to including it for folks who want to have an error queue

iancooper avatar Jul 19 '25 18:07 iancooper

I'll talk to the customer about that, I think they have some requirements here that are regulated, but maybe we can approach it some other way

jakoss avatar Jul 20 '25 12:07 jakoss

Cool, I'm always happy to add some middleware for folks that want to do this; I just want to talk them out of it first.

iancooper avatar Jul 20 '25 13:07 iancooper

We will provide some options here, as part of the universal DLQ work, so marking as In Progress

iancooper avatar Dec 01 '25 08:12 iancooper

We will provide some options here, as part of the universal DLQ work, so marking as In Progress

Do you mean you are implementing some kind of universal DLQ for all providers? Or is that just adjustments around rabbitmq?

jakoss avatar Dec 01 '25 09:12 jakoss

@jakoss So ensuring that you have access to a DLQ (or Invalid Message Channel) and can explicitly push to it, even where not supported by the provider. This PR is just the ADR for now. Feel free to review and give feedback before I start working on it.

iancooper avatar Dec 01 '25 10:12 iancooper

This seems great, exactly what we need

jakoss avatar Dec 01 '25 11:12 jakoss