NServiceBus.SqlServer icon indicating copy to clipboard operation
NServiceBus.SqlServer copied to clipboard

Delayed Messages can be delivered up to ~60s later than intended because of a race condition.

Open LukeButters opened this issue 11 months ago • 2 comments

Describe the bug

Description

When sending a delayed message with a very low delay e.g. 1ms, the message is sometimes processed ~30s later.

This is on an endpoint that is not doing anything else.

Expected behavior

The delayed message should be processed fairly soon after 1ms.

Actual behavior

Messages are sometimes processed ~30s later

Versions

9.2.7

Steps to reproduce

  1. Start NSB
  2. wait ~30s
  3. Send a delayed message with a delay 1ms
  4. See that it does not get processed until 30s or so later on.

Relevant log output


Additional Information

Suspected Cause

This looks like a race condition between when the sql transaction is committed and when the in memory BackOffStrategy decides to look for a message.

For example when sending a delayed message with a delay of 1ms:

It looks as though when DelayedMessageTable.Store() is called, it first writes the message to the DB:

var messageRow = StoreDelayedMessageCommand.From(message.Headers, message.Body, dueAfter, destination);
using (var command = connection.CreateCommand())
{
    command.Transaction = transaction;
    command.CommandText = storeCommand;
    command.CommandType = CommandType.Text;

    messageRow.PrepareSendCommand(command);
    await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
}

before the transaction is committed it calls:

OnStoreDelayedMessage?.Invoke(null, DateTime.UtcNow.Add(dueAfter));

which in turns calls RegisterNewDueTime(DateTime dueTime), which can result in checking for Delayed Messages within the next 1ms. It could look for this message before the transaction commits. Which results in no messages being found resulting in the delay before looking again increasing. This then results in a large delay before processing the message.

LukeButters avatar May 21 '25 03:05 LukeButters

Hey @LukeButters,

Looks like we might have a bug here. Before I get too excited though, I'd like to ask you a few questions:

  1. Were the peeksettings changed? https://docs.particular.net/transports/sql/design#behavior-queue-peek-settings
  2. Does your system make frequent use of delayed messages, or is this more of a sporadic thing? Sporadic would help explain why the backoff is getting to the 30second-ish range already.
  3. You mention that you're using version 9.2.7, but this is likely the NServiceBus version and not the NServiceBus.SqlServer version. What is the version of NserviceBus.SqlServer are you using?

I'll try create a repro internally so that we can triage this and hopefully get it fixed soon.

WilliamBZA avatar May 21 '25 10:05 WilliamBZA

  1. No I don't think the peek settings had changed.
  2. The endpoint has very few delayed messages.
  3. the version is:
<PackageReference Include="NServiceBus.Persistence.Sql" Version="8.1.2" PrivateAssets="build;buildTransitive" />
<PackageReference Include="NServiceBus.Transport.SqlServer" Version="8.1.8" />

LukeButters avatar May 27 '25 05:05 LukeButters