Transaction task never completes after connection blip
Hello, I left a comment on #2630 but it was already closed so I'm creating a new issue. We recently saw this happen again. From my previous post which still holds:
We saw another instance where a Redis transaction is hung forever after a connection blip. The version we are using is 2.7.17.27058.
In the logs, I see this sequence of events:
07:20:22.719 - <log line right before we do the transaction>
07:20:22.990 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 4) on 10.10.70.233:6001/Interactive, Flushed/Faulted, last: GET, origin: ReadFromPipe, outstanding: 2, last-read: 4s ago, last-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:23.012 - ConnectionRestored with args: 10.10.70.233:6001
07:20:26.558 - ConfigurationChanged with endpoint: 10.10.70.233:6001
07:20:30.259 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 39) on 10.10.70.232:6001/Interactive, Idle/Faulted, last: HGET, origin: ReadFromPipe, outstanding: 1, last-read: 4s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.233:6003/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.232:6002/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.231:6003/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.233:6002/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.269 - ConnectionRestored with args: 10.10.70.233:6003
07:20:30.269 - ConnectionRestored with args: 10.10.70.232:6002
07:20:30.269 - ConnectionRestored with args: 10.10.70.231:6003
07:20:30.270 - ConnectionRestored with args: 10.10.70.232:6001
07:20:30.270 - ConnectionRestored with args: 10.10.70.233:6002
07:20:30.275 - StackExchange.Redis.RedisConnectionException: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 39) on 10.10.70.232:6001/Interactive, Idle/Faulted, last: HGET, origin: ReadFromPipe, outstanding: 1, last-read: 4s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
---> Pipelines.Sockets.Unofficial.ConnectionResetException: An existing connection was forcibly closed by the remote host.
---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host.
at Pipelines.Sockets.Unofficial.Internal.Throw.Socket(Int32 errorCode) in /_/src/Pipelines.Sockets.Unofficial/Internal/Throw.cs:line 59
at Pipelines.Sockets.Unofficial.SocketAwaitableEventArgs.GetResult() in /_/src/Pipelines.Sockets.Unofficial/SocketAwaitableEventArgs.cs:line 74
07:20:30.292 - ConfigurationChanged with endpoint: 10.10.70.232:6001
This is what the transaction code looks like:
ITransaction transaction = Database.CreateTransaction();
transaction.AddCondition(
Condition.StringEqual(key1, 124));
Task<bool> task1 = transaction.KeyExpireAsync(
key1,
TimeSpan.FromMinutes(30));
Task<bool> task2 = transaction.KeyExpireAsync(
key2,
TimeSpan.FromMinutes(30));
Task<bool> task3 = transaction.KeyExpireAsync(
key3,
TimeSpan.FromMinutes(30));
Task<bool> task4 = transaction.KeyExpireAsync(
key4,
TimeSpan.FromMinutes(30));
if (!await transaction.ExecuteAsync())
{
throw new Exception();
}
bool exp1 = await task1;
bool exp2 = await task2;
bool exp3 = await task3;
bool exp4 = await task4;
_logger.Verbose("{0} | {1} | {2} | {3}", exp1, exp2, exp3, exp4);
I took a process dump and when I open it I see:
- A task is stuck on
transaction.ExecuteAsync().- This task's status is WaitingForActivation
task1,task2,task3, andtask4are all Cancelled.transactionappears to be null suggesting it was garbage collected since it went out of scope.
Some notes:
- We do NOT recreate the ConnectionMultiplexer.
- I can see in the process dump that we still have a reference to the ConnectionMultiplexer that we created originally.
I was able to gather some new information from the latest instance of this issue. I took a process dump and I noticed 36 async logical stack traces in PhysicalConnection.ReadFromPipe. I took a look at each one and in every case PhysicalConnection.TransactionActive was false. My best guess is that the transaction task was orphaned somehow.
Any thoughts on this?
Similar issue encountered with calls to StringGetWithExpiryAsync - some of the calls just hang after the server experiences a short spike of timeouts
We encountered this bug in our production environment, with driver version v2.6.66. Upon analysis, we discovered a thread safety issue in the PhysicalConnection class, specifically within the RecordConnectionFailed and EnqueueInsideWriteLock functions. If the RecordConnectionFailed function reaches line 483 and another thread executes the EnqueueInsideWriteLock function, the Message will never be processed, causing the asynchronous task to remain suspended indefinitely.
@mgravell @NickCraver
I wonder if #2773 and #2786 both related to this?
I may have a repro with the following unit tests. If I "Run until failure" this unit test it reliably fails in less than 20 runs with some kind of failure. Sometimes it fails because of timeout (Task seems to be hung) but interestingly I also see other exceptions bubbling up such as InvalidOperationException and ArgumentOutOfRangeExceptions that may be indicative of other bugs.
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
using Xunit;
using Xunit.Abstractions;
namespace StackExchange.Redis.Tests.Issues;
public class Issue2742 : TestBase
{
public Issue2742(ITestOutputHelper output, SharedConnectionFixture? fixture = null)
: base(output, fixture) { }
[Fact]
public async Task Execute()
{
try
{
using var conn = Create(allowAdmin: true, shared: false, asyncTimeout: 1000);
var server = GetServer(conn);
var db = conn.GetDatabase();
await db.PingAsync();
RedisKey key = Me();
await db.KeyDeleteAsync(key);
const int numIter = 1000;
List<Task<long>> incTasks = new(numIter);
List<Task<bool>> txTasks = new(numIter);
for (int i = 0; i < numIter; i++)
{
var tx = db.CreateTransaction();
_ = tx.StringGetAsync(key);
incTasks.Add(tx.StringIncrementAsync(key));
txTasks.Add(tx.ExecuteAsync());
}
conn.AllowConnect = false;
await Task.WhenAny(incTasks);
server.SimulateConnectionFailure(SimulatedFailureType.All);
TimeSpan timeout = TimeSpan.FromSeconds(10);
Task timeoutTask = Task.Delay(timeout);
Stopwatch sw = Stopwatch.StartNew();
long max = -1;
int exceptions = 0;
foreach (Task<long> t1 in incTasks)
{
try
{
Task completedTask = await Task.WhenAny(t1, timeoutTask).ForAwait();
Assert.Equal(t1, completedTask);
long result = await t1;
max = Math.Max(max, result);
}
catch (TaskCanceledException)
{
exceptions++;
}
catch (RedisException)
{
exceptions++;
}
}
Log($"inc {max}, num ex = {exceptions}");
Assert.Equal(1000, max + exceptions);
foreach (Task<bool> txTask in txTasks)
{
Assert.True(txTask.IsCompleted);
}
sw.Stop();
Log($"elapsed = {sw.Elapsed}");
Assert.True(sw.Elapsed < timeout, "took too long");
}
finally
{
ClearAmbientFailures();
}
}
}
An example of an ArgumentOutOfRangeException:
Message:
System.ArgumentOutOfRangeException : Specified argument was out of the range of valid values. (Parameter 'position')
Stack Trace:
ThrowHelper.ThrowArgumentOutOfRangeException_PositionOutOfRange()
ReadOnlySequence`1.GetPosition(Int64 offset, SequencePosition origin)
BufferReader.SnapshotPosition() line 123
BufferReader.TryConsumeAsBuffer(Int32 count, ReadOnlySequence`1& buffer) line 147
BufferReader.ConsumeAsBuffer(Int32 count) line 130
PhysicalConnection.ReadLineTerminatedString(ResultType type, ResultFlags flags, BufferReader& reader) line 2088
PhysicalConnection.TryParseResult(ResultFlags flags, Arena`1 arena, ReadOnlySequence`1& buffer, BufferReader& reader, Boolean includeDetilInExceptions, ServerEndPoint server, Boolean allowInlineProtocol) line 2164
PhysicalConnection.TryParseResult(Boolean isResp3, Arena`1 arena, ReadOnlySequence`1& buffer, BufferReader& reader, Boolean includeDetilInExceptions, PhysicalConnection connection, Boolean allowInlineProtocol) line 2135
PhysicalConnection.ProcessBuffer(ReadOnlySequence`1& buffer) line 1954
PhysicalConnection.ReadFromPipe() line 1897
Issue2742.Execute() line 61
--- End of stack trace from previous location ---