NServiceBus.Transport.AzureServiceBus icon indicating copy to clipboard operation
NServiceBus.Transport.AzureServiceBus copied to clipboard

Incorrect routing of messages in ServiceBus

Open gusmanb opened this issue 2 years ago • 13 comments

When ServiceBus is used as transport and pub/sub is used the bundle-1 topic subscriptions are created with a SQL filter like this:

[NServiceBus.EnclosedMessageTypes] LIKE '%(full class name)%'

This is totally incorrect, suppose the case where you have these events:

MyNamespace.MyEvent
MyNamespace.MyEventTwo

As the filters uses the LIKE clausule, and even worse, percentages are explicitly added to the filter, all the MyEventTwo events will be also routed to all the queues registered for MyEvent which at the end causes a "Handler not found" error as there is no handler for such messages.

The fitlers shoud NEVER use a LIKE clausule, they must use equality comparers:

[NServiceBus.EnclosedMessageTypes] = '(full class name)'

gusmanb avatar Mar 10 '22 19:03 gusmanb

Hi @gusmanb

Thanks for reaching out to us and raising your concerns. I took the liberty of moving it to the correct repository. I can relate to your skepticism when you look at the current filter rule from a technical and puristic standpoint. You are indeed correct when defining event that have the same "base name" they will get routed due to the LIKE condition and the capturing % at the end.

This is though a feature of the topology that we explicitly designed for. The reason being that NServiceBus as a higher level framework, and as such it is trying to push you into the direction of declaring meaningful business events. MyEvent and MyEventTwo are rarely ever meaningful business events. When following the guidelines you would end up with event names like Shipping.OrderShipped or Sales.OrderAccepted which by the virtue of the naming doesn't make it likely to clash.

Let's take the above example to the next level. Assuming we have a situation like Sales.OrderAccepted and Sales.OrderAcceptedTwo (to follow your example). Sales.OrderAcceptedTwo would then fall under the same "baseName" of Sales.OrderAccepted and considered by NServiceBus as a version of the Sales.OrderAccepted event. Therefore, routing it to the subscriber of Sales.OrderAccepted enables evolving the contract without falling into the problem of assembly versioning.

Useful guidelines

danielmarbach avatar Mar 11 '22 11:03 danielmarbach

@gusmanb let me know if the current behavior of the Azure Service Bus transport poses a problem for one of your endpoints, and we might be able to find a suitable workaround for that specific endpoint. From a behavior perspective though in general this works as intended as I tried to highlight above, and I will be closing this issue as soon as I have received your feedback.

danielmarbach avatar Mar 11 '22 11:03 danielmarbach

Hi.

Yes, I agree that the example is not meaningful, in our real case we have "NativeMessages.ExternalCall" and "NativeMessages.ExternalCallResult" which are meaningful and caused the results to be routed to the services that receive the call. Of course that is very easy to solve, changing "NativeMessages.ExternalCall" to "NativeMessages.ExternalCallRequest" (what we already have done) solves the problem.

But, said that, if this is intended by design to force users to follow best practices, should not NServiceBus throw an error when it finds this situation at least when autosubscription is enabled? It could check the subscriptions in the "bundle" topic and if it is going to register a message that would collide with an already registered one then throw the error.

The main problem I see is that for someone that doesn't known this "feature" it's a lot of time wasted until you find what is happening, without an error message you are clueless and to be honest I can't find anywhere in the documentation that states that messages will be matched by partial class names, so what you expect is exact matches, not partial ones.

Also, if you consider the current status you have two pieces of the same system that don't follow the same policy, routing in ServiceBus matches partial class names, but NServiceBus itself matches handlers by exact class names, what as I see it is not good practice.

Cheers.

gusmanb avatar Mar 11 '22 12:03 gusmanb

@gusmanb

I agree that the example is not meaningful

Apologies, it wasn't my intention to play down your example. I truly understood it was a generic one for the discussion, and as such an excellent example to get started with.

Based on your feedback I have raised a documentation PR to make it visible on the topology documentation how things are behaving at the moment. I have also added there more details about why that also includes the stronger reason why we introduced it which is the polymorphic behavior (sorry it only occurred to me while I started writing the documentation update).

Regarding your concrete example. Thanks for that! When we talk to customers in solution architecture discussions, a scenario like you described is usually advised to be treated as a request/reply scenario and not a pub/sub scenario. The reason being it fits closer to the definition of

  • Used to make a request to perform an action.
  • Has one logical owner.
  • Should be sent to the logical owner.
  • The sender and the receiver have a close relationship ("coupling") and most of the time there is no interest in the result of the command by multiple subscribers. The interest usually lies in the business event that will be published by the command initiating service that receives the reply/result of the command handler

This would make the NativeMessages.ExternalCall a command (ICommand) and NativeMessages.ExternalCallResult a result (IMessage).

image

https://docs.particular.net/nservicebus/messaging/messages-events-commands

danielmarbach avatar Mar 11 '22 13:03 danielmarbach

Given the message type header contents, how would you suggest to construct the filter, @gusmanb?

SeanFeldman avatar Mar 11 '22 14:03 SeanFeldman

To add more details, how would you propose to handle SQL filters (correlation filters won't work, tested already) given that a message could be in a chain of inheritance and polymorphism has to be respected?

Here's an example of what an event type header (NServiceBus.EnclosedMessageTypes) value could look like:

Contracts.Events.Api.FileChanged, Contracts, Version=1.0.0.0, Culture=neutral, 
PublicKeyToken=null;Contracts.Events.Api.FileSyncEvent, Contracts, Version=1.0.0.0, Culture=neutral, 
PublicKeyToken=null;Contracts.Events.Api.SyncEvent, Contracts, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null

SeanFeldman avatar Mar 11 '22 20:03 SeanFeldman

Apologies, it wasn't my intention to play down your example.

Nothing to apologize for :)

Here's an example of what an event type header...

Ok, now I start to understand your problem, I had not checked the header contents but it starts making sense why you did the filter like these, to allow the register of a handler for a base class that would receive events of any child type.

Let's break the problem in two steps, delimiting the end and delimiting the start of the class name.

Delimiting the end of the class name should be easy, if the filter instead of '%(className)%' becomes '%(className),%' will solve this step, as the header includes the class assembly qualified name we can rely on that final comma at the end of the name to do the delimitation.

To delimit the start of the class name I see the header contains the class assembly qualified name and if I undersstood it correctlly its ancestors assembly qualified names joined by a semicolon with no spaces, if that's the case then we can rely in the beginning of the string or the semicolon using an OR: ... LIKE '(className),%' ... OR LIKE '%;(className),%'.

Basically a complete filter for a class named MyNamespace.MyEvent would look like this:

[NServiceBus.EnclosedMessageTypes] LIKE 'MyNamespace.MyEvent,%' OR [NServiceBus.EnclosedMessageTypes] LIKE '%;MyNamespace.MyEvent,%'

I'm out of the office so I can't test this until next week but I think that would solve all the problems at once, I will try to get some time on Monday to test it and check if it would work.

Cheers.

gusmanb avatar Mar 11 '22 22:03 gusmanb

You're correct about the two patterns and the ability to OR between those. There are two concerns there:

  1. Performance
  2. SQL Filter constraints

Performance

Let's have a look at the economics of this change. An event published to the topic (bundle-1) would be evaluated against each subscription (a subscription per each endpoint). The message would be assessed against each SQL filter for each subscription until a satisfied filter is found. Each filter evaluation is computed based on, abstracted by the service. A simple filter such as X LIKE Y is not as performant as X = Y but better than X LIKE Y OR X LIKE Z. On a system with a few endpoints and events, it's not a big deal. With a significantly larger system, with a substantial number of events, this can cause a namespace to crawl.

SQL Filter constraints

This one seems less of a concern in most scenarios but still needs to be considered. The maximum length of filter condition string per subscription/endpoint: 1,024 characters in total. Not a common scenario, but never say never.

SeanFeldman avatar Mar 12 '22 01:03 SeanFeldman

@gusmanb FYI the documentation PR also shows some header examples.

danielmarbach avatar Mar 12 '22 08:03 danielmarbach

@gusmanb FYI the https://github.com/Particular/docs.particular.net/pull/5660 also shows some header examples.

That looks really good, with that documentation in place to avoid a situation like that is just a matter of reading the docs :)

@SeanFeldman Yes, of course it would use more cpu resources, but I'm not totally sure how many resources would be and if it would really impact the namespace workload, ASB limits resources to your namespace based on credits, and filters take 10 credits no matter its complexity, so I woud assume that Microsoft has taken into account that complex filters could be created, but this is only a suposition.

Anyway, in any case what could at least be included in the filter is the final comma, that would reduce the posibility of missmatch greatly as it would discard extended names with no performance impact, is only a singe character more for the LIKE comparison.

Cheers.

gusmanb avatar Mar 12 '22 10:03 gusmanb

Anyway, in any case what could at least be included in the filter is the final comma, that would reduce the posibility of missmatch greatly as it would discard extended names with no performance impact, is only a singe character more for the LIKE comparison.

That's a good suggestion. Needs to be validated against native integrationnative integration scenario to ensure backwards compatibility.

SeanFeldman avatar Mar 12 '22 16:03 SeanFeldman

Including the final comma would break consumer driven contracts. It is also possible that the list of type names only contains one type and then there will be no semicolon which makes matching on the semicolon tricky too.

danielmarbach avatar Mar 13 '22 09:03 danielmarbach

And there we go, CDC.

AFAIR, we've looked into options in the past, and did simplify the filter, but couldn't get it to anything simpler than what it is today.

Another option @gusmanb would be taking control over subscription provisioning and create your own filters.

SeanFeldman avatar Mar 14 '22 00:03 SeanFeldman