dotNext
dotNext copied to clipboard
DotNext.Net.Cluster: System.ArgumentOutOfRangeException: Non-negative number required. (Parameter 'length')
Discussed in https://github.com/dotnet/dotNext/discussions/243
Originally posted by LarsWithCA June 25, 2024 Hi @sakno,
Once in a while we get a series of this exception during startup (possibly after restart/power-cycle) hindering the cluster from getting fully up and running:
2024-06-24 11:36:24.9866|ERROR|DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer|Failed to process request from 192.168.100.154:52480|System.ArgumentOutOfRangeException: Non-negative number required. (Parameter 'length')
at System.IO.RandomAccess.SetLength(SafeFileHandle handle, Int64 length)
at DotNext.Net.Cluster.Consensus.Raft.PersistentState.Table.UnsealIfNeededAsync(Int64 truncatePosition, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/PersistentState.Partition.cs:line 647
--- End of stack trace from previous location ---
at DotNext.Net.Cluster.Consensus.Raft.PersistentState.Table.WriteThroughAsync(CachedLogEntry entry, Int32 index, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/PersistentState.Partition.cs:line 605
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
at DotNext.Net.Cluster.Consensus.Raft.PersistentState.AppendUncachedAsync[TEntry](ILogEntryProducer`1 supplier, Int64 startIndex, Boolean skipCommitted, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/PersistentState.cs:line 481
at DotNext.Net.Cluster.Consensus.Raft.PersistentState.AppendAsync[TEntry](ILogEntryProducer`1 entries, Int64 startIndex, Boolean skipCommitted, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/PersistentState.cs:line 520
at DotNext.Net.Cluster.Consensus.Raft.PersistentState.AppendAndCommitSlowAsync[TEntry](ILogEntryProducer`1 entries, Int64 startIndex, Boolean skipCommitted, Int64 commitIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/PersistentState.cs:line 550
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.AppendEntriesAsync[TEntry](ClusterMemberId sender, Int64 senderTerm, ILogEntryProducer`1 entries, Int64 prevLogIndex, Int64 prevLogTerm, Int64 commitIndex, IClusterConfiguration config, Boolean applyConfig, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 620
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at DotNext.Net.Cluster.Consensus.Raft.TransportServices.ConnectionOriented.Server.AppendEntriesAsync(ProtocolStream protocol, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/TransportServices/ConnectionOriented/Server.cs:line 142
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.HandleConnection(Socket remoteClient) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 134
Our setup:
- Linux ARM 32bit (little endian)
- .NET8 + DotNext.Net.Cluster 5.7.0
- Cluster with 6 nodes
- 10Hz communication frequency
- MemoryBasedStateMachine.Options:
- recordsPerPartition = 50
- BufferSize = 8192 * 64 * 10
- InitialPartitionSize = 50 * 8192 * 10
- CompactionMode = CompactionMode.Sequential
- WriteMode = WriteMode.AutoFlush
- CacheEvictionPolicy = LogEntryCacheEvictionPolicy.OnSnapshot
Here are the values of various arguments/variables/fields inside PersistentState.Table.WriteThroughAsync when the exception happens:
- index=1
- fileOffset=512
- FirstIndex=150
- LastIndex=199
- PartitionNumber=3
- footer.Length=2000
- entry.Length=1425
- GetMetadataBuffer(index - 1).Length=40
- writeAddress=LogEntryMetadata.GetEndOfLogEntry(GetMetadataBuffer(index - 1).Span)=-7426796073504137408
Are there any other values I should try and capture that might help you investigate/solve this?
In the previous version we were running (5.5.0), apart from this exception we also saw "System.ArgumentOutOfRangeException: Specified file length was too large for the file system." in the same area of the code - we might not be seeing that in the newest version.