runtime
runtime copied to clipboard
SslStream intermittently returns zero byte read mid-stream in net8.0
Description
We have a console application that pulls data from a 3rd party service via an SslStream. Historically, this has worked flawlessly in production under net6.0 and net7.0. Upon re-targeting the application to net8.0, we noticed that SslStream.Read(...) is returning 0 mid-stream intermittently. There are times the code is able to read the entirety of the stream without issue and times when it will return 0 when there is more data left to read. To be sure, network streams can return 0 even when you anticipate more data (for a variety of reasons), but this particular anomaly of intermittent failures only occurs after targeting net8.0.
We also stumbled across this GitHub issue during our investigation, which may be tangentially (or directly) related: https://github.com/dotnet/runtime/issues/88792.
Reproduction Steps
The following code runs consistently and successfully under net6.0 and net7.0. It fails intermittently under net8.0. This code has been tested on linux-x64 and win-x64 with the same results. Furthermore, this scenario has been tested from several completely separate physical locations on different networks to help rule out any potential network-specific or infrastructure issues.
var buffer = new byte[length]; // length is known at runtime
var totalBytesRead = 0;
var timeout = DateTime.Now.AddMilliseconds(NET_READ_TIMEOUT); // 2 minute grace period to allow for any latency or other slowdowns.
while (totalBytesRead < length)
{
var bytesRead = _sslStream.Read(buffer, totalBytesRead, length - totalBytesRead);
if (bytesRead > 0)
{
totalBytesRead += bytesRead;
_logger.LogTrace("Read {0} of {1} expected bytes. Read this time: {2}", totalBytesRead, length, bytesRead);
timeout = DateTime.Now.AddMilliseconds(NET_READ_TIMEOUT);
}
else
{
if (DateTime.Now > timeout)
{
_logger.LogError("Received zero byte read for {0} of {1} expected bytes. After waiting {2}ms, no additional data was received.", totalBytesRead, length, NET_READ_TIMEOUT);
throw new InvalidOperationException("Show timed out receiving content.");
}
}
}
Example log output from above code on net6.0 and net7.0 (consistent success case):
[10:20:24 VRB] Read 29696 of 73667 expected bytes. Read this time: 29696
[10:20:24 VRB] Read 35840 of 73667 expected bytes. Read this time: 6144
[10:20:24 VRB] Read 38912 of 73667 expected bytes. Read this time: 3072
[10:20:24 VRB] Read 59392 of 73667 expected bytes. Read this time: 20480
[10:20:24 VRB] Read 62464 of 73667 expected bytes. Read this time: 3072
[10:20:24 VRB] Read 73667 of 73667 expected bytes. Read this time: 11203
Example log output from above code on net8.0 (intermittent success case):
[10:24:23 VRB] Read 2048 of 73667 expected bytes. Read this time: 2048
[10:24:23 VRB] Read 3072 of 73667 expected bytes. Read this time: 1024
[10:24:23 VRB] Read 5120 of 73667 expected bytes. Read this time: 2048
[10:24:23 VRB] Read 7168 of 73667 expected bytes. Read this time: 2048
[10:24:23 VRB] Read 11264 of 73667 expected bytes. Read this time: 4096
[10:24:23 VRB] Read 15360 of 73667 expected bytes. Read this time: 4096
[10:24:23 VRB] Read 18432 of 73667 expected bytes. Read this time: 3072
[10:24:23 VRB] Read 22528 of 73667 expected bytes. Read this time: 4096
[10:24:23 VRB] Read 27648 of 73667 expected bytes. Read this time: 5120
[10:24:23 VRB] Read 29696 of 73667 expected bytes. Read this time: 2048
[10:24:23 VRB] Read 39936 of 73667 expected bytes. Read this time: 10240
[10:24:23 VRB] Read 43008 of 73667 expected bytes. Read this time: 3072
[10:24:23 VRB] Read 47104 of 73667 expected bytes. Read this time: 4096
[10:24:23 VRB] Read 52224 of 73667 expected bytes. Read this time: 5120
[10:24:23 VRB] Read 56320 of 73667 expected bytes. Read this time: 4096
[10:24:23 VRB] Read 58368 of 73667 expected bytes. Read this time: 2048
[10:24:23 VRB] Read 59392 of 73667 expected bytes. Read this time: 1024
[10:24:23 VRB] Read 61440 of 73667 expected bytes. Read this time: 2048
[10:24:23 VRB] Read 62464 of 73667 expected bytes. Read this time: 1024
[10:24:23 VRB] Read 65536 of 73667 expected bytes. Read this time: 3072
[10:24:23 VRB] Read 66560 of 73667 expected bytes. Read this time: 1024
[10:24:23 VRB] Read 69632 of 73667 expected bytes. Read this time: 3072
[10:24:23 VRB] Read 72704 of 73667 expected bytes. Read this time: 3072
[10:24:23 VRB] Read 73667 of 73667 expected bytes. Read this time: 963
Example log output from above code on net8.0 (intermittent failure case):
[10:26:44 VRB] Read 40 of 73667 expected bytes. Read this time: 40
[10:26:44 VRB] Read 6184 of 73667 expected bytes. Read this time: 6144
[10:26:44 VRB] Read 8232 of 73667 expected bytes. Read this time: 2048
[10:26:44 VRB] Read 10280 of 73667 expected bytes. Read this time: 2048
[10:26:44 VRB] Read 13352 of 73667 expected bytes. Read this time: 3072
[10:26:44 VRB] Read 14376 of 73667 expected bytes. Read this time: 1024
[10:26:44 VRB] Read 17448 of 73667 expected bytes. Read this time: 3072
[10:26:44 VRB] Read 21544 of 73667 expected bytes. Read this time: 4096
[10:26:44 VRB] Read 26664 of 73667 expected bytes. Read this time: 5120
[10:26:44 VRB] Read 27688 of 73667 expected bytes. Read this time: 1024
[10:26:44 VRB] Read 28712 of 73667 expected bytes. Read this time: 1024
[10:26:44 VRB] Read 33832 of 73667 expected bytes. Read this time: 5120
[10:26:44 VRB] Read 35880 of 73667 expected bytes. Read this time: 2048
[10:26:44 VRB] Read 37928 of 73667 expected bytes. Read this time: 2048
[10:26:44 VRB] Read 42024 of 73667 expected bytes. Read this time: 4096
[10:26:44 VRB] Read 43048 of 73667 expected bytes. Read this time: 1024
[10:26:44 VRB] Read 44072 of 73667 expected bytes. Read this time: 1024
[10:26:44 VRB] Read 46120 of 73667 expected bytes. Read this time: 2048
[10:26:44 VRB] Read 48168 of 73667 expected bytes. Read this time: 2048
[10:26:44 VRB] Read 51240 of 73667 expected bytes. Read this time: 3072
[10:26:44 VRB] Read 53288 of 73667 expected bytes. Read this time: 2048
[10:26:44 VRB] Read 58408 of 73667 expected bytes. Read this time: 5120
[10:26:44 VRB] Read 60456 of 73667 expected bytes. Read this time: 2048
[10:26:44 VRB] Read 64552 of 73667 expected bytes. Read this time: 4096
[10:26:44 VRB] Read 68648 of 73667 expected bytes. Read this time: 4096
[10:26:44 VRB] Read 71720 of 73667 expected bytes. Read this time: 3072
[10:26:44 VRB] Read 72683 of 73667 expected bytes. Read this time: 963
[10:28:44 ERR] Received zero byte read for 72683 of 73667 expected bytes. After waiting 120000ms, no additional data was received.
An interesting observation is that every time we've noticed a failure scenario, it appears to happen sometimes after a read of less than 1024 bytes. Example above: 963 bytes were read while there were still 984 bytes left to consume. Not sure if this is significant or not, but felt compelled to call it out. This pattern has been observed in scenarios after just a couple of reads to near the end such as the example above.
Expected behavior
SslStream.Read(...) should consistently return a non-zero result in net8.0 as it does in net7.0 and net6.0 given the exact same conditions.
Actual behavior
SslStream.Read(...) does not consistently return a non-zero result in net8.0 as it does in net7.0 and net6.0 given the exact same conditions.
Regression?
This appears to only affect net8.0. Previous versions net6.0 and net7.0 appear to remain unaffected.
Known Workarounds
None.
Configuration
dotnet --version produces 8.0.400
Tested on: linux-x64 and win-x64
Other information
No response
Tagging subscribers to this area: @dotnet/ncl, @bartonjs, @vcsjones See info in area-owners.md if you want to be subscribed.
if you can isolate to small repro, can you get us core dump @ubercellogeek? I suspect this will highly depend on your particular environment and it will be diffuclket for us to investigate in isolation. If core is difficult can you dump _receivedEOF and various *Length properties on internal _buffer?
Happy to! Attached is a triage level core dump (during which the execution exhibited the anomaly in question). If you would like a different type, I can provide that as well. coredump.8532.dmp
@ubercellogeek I have problems investigating the attached dump (cannot walk the heap). The file is rather small, is it a full memory dump?
Do the subsequent reads after the 0 reads return nonzero? (it is not clear from the output whether this happens or not)
It would be good to add something like Debug.Assert(bytesRead > 0) and run the code in debug config (or use Environment.FailFast if you have to run in Release). This way we can examine the state of the SslStream right around the error state. Following envvars should also be used to guarantee full memory dump
# Enable dump collection
DOTNET_DbgEnableMiniDump=1
DOTNET_DbgMiniDumpType=MiniDumpWithFullMemory
DOTNET_DbgMiniDumpName="C:/path/to/coredump.%p"
@rzikm - To answer your question, no, subsequent reads do not return non-zero. I've re-run the app in Debug mode, added the Debug.Assert(bytesRead > 0) bit and generated a full dump (too large for attachment here, so here's the link to the dump: https://pdastorage.blob.core.windows.net/dumps/coredump.18272.dmp). Happy to provide more information/assets if you require.
In the provided dump, the _receivedEOF flag is set
0:000> !dumpobj /d 265d9446438
Name: System.Net.Security.SslStream
MethodTable: 00007ffa43c9c860
EEClass: 00007ffa43caee90
Tracked Type: false
Size: 256(0x100) bytes
File: C:\Program Files\dotnet\shared\Microsoft.NETCore.App\8.0.8\System.Net.Security.dll
Fields:
MT Field Offset Type VT Attr Value Name
0000000000000000 4001c63 8 ...ing.SemaphoreSlim 0 instance 0000000000000000 _asyncActiveSemaphore
00007ffa43a50018 4001c62 1408 System.IO.Stream 0 static 00000265d94465b8 Null
00007ffa43a50018 40001cb 10 System.IO.Stream 0 instance 00000265d9446400 _innerStream
00007ffa436dd070 40001cc 18 System.Boolean 1 instance 0 _leaveStreamOpen
00007ffa43a50730 400025c 20 ...ptionDispatchInfo 0 instance 0000000000000000 _exception
00007ffa436dd070 400025d 74 System.Boolean 1 instance 0 _shutdown
00007ffa436dd070 400025e 75 System.Boolean 1 instance 1 _handshakeCompleted
00007ffa43c9bf08 400025f 80 ...lStream+SslBuffer 1 instance 00000265d94464b8 _buffer
00007ffa43711188 4000260 58 System.Int32 1 instance 0 _nestedWrite
00007ffa43711188 4000261 5c System.Int32 1 instance 0 _nestedRead
00007ffa43cce6a8 4000262 28 ...enticationOptions 0 instance 00000265d9446538 _sslAuthenticationOptions
00007ffa43711188 4000263 60 System.Int32 1 instance 0 _nestedAuth
00007ffa436dd070 4000264 76 System.Boolean 1 instance 0 _isRenego
00007ffa43c9c3a0 4000265 a8 ...lper+TlsFrameInfo 1 instance 00000265d94464e0 _lastFrame
00007ffa43dd7c30 4000266 30 ...Private.CoreLib]] 0 instance 0000000000000000 _handshakeWaiter
00007ffa436dd070 4000267 77 System.Boolean 1 instance 1 _receivedEOF
00007ffa43711188 4000268 64 System.Int32 1 instance 0 _connectionOpenedStatus
00007ffa43cf44f0 4000269 38 ...feFreeCredentials 0 instance 00000265d944d5d8 _credentialsHandle
00007ffa43cffe18 400026a 40 ...eDeleteSslContext 0 instance 00000265d9449e38 _securityContext
00007ffa43c9c7a0 400026b d0 ...SslConnectionInfo 1 instance 00000265d9446508 _connectionInfo
00007ffa4388c370 400026c 48 ...s.X509Certificate 0 instance 00000265d94420f8 _selectedClientCertificate
00007ffa4388ccb0 400026d 50 ....X509Certificate2 0 instance 00000265d94538f8 _remoteCertificate
00007ffa436dd070 400026e 78 System.Boolean 1 instance 0 _remoteCertificateExposed
00007ffa43711188 400026f 68 System.Int32 1 instance 21 _headerSize
00007ffa43711188 4000270 6c System.Int32 1 instance 64 _trailerSize
00007ffa43711188 4000271 70 System.Int32 1 instance 16299 _maxDataSize
00007ffa43a50730 400025b a8 ...ptionDispatchInfo 0 static 00000265d9453f40 s_disposedSentinel
00007ffa43d2ffb0 4000272 b0 ....Cryptography.Oid 0 static 00000265d9453f80 s_serverAuthOid
00007ffa43d2ffb0 4000273 b8 ....Cryptography.Oid 0 static 00000265d9453fa8 s_clientAuthOid
Also there don't seem to be any data left in the internal buffers, this looks exactly like a SslStream from which all available data have been read.
Agreed on both points. Unfortunately, while running this exact same code pointed at net6.0 and net7.0 I cannot get the debug assertion to hit.
Do you have any other way to confirm that the data itself is actually being truncated?
Another thing that can be checked out is examining the network traffic via Wireshark (to see if the data really arrived), but that is possible only on Linux and (unfortunately on .NET 8) only with Debug build of System.Net.Security (the ability to use SSLKEYLOGFILE envvar to dump encryption secrets is not preset on publicly released builds). I can provide you this dll if you are interested, or you can try running the app on latest .NET 9 preview (again, on Linux, see https://github.com/dotnet/runtime/pull/100665 for steps).
what is the underlying stream? I've seen problems with 0 bytes reads in some implementations. I think starting with 8.0 SslStream does zero bytes reads to prevent early allocations.
what is the underlying stream?
From dump it seems to be classic NetworkStream.
Correct, should be a NetworkStream. My first thought @wfurt was perhaps this implementation somehow suffers from that zero byte read change in net8.0, but this is a fairly basic scenario...that being said, I could supply source if needed. I'd prefer not to publicly post here, however.
@rzikm I have confirmation that when this occurs, data is indeed truncated from the expected result.
It is surprising we would get EOF and then read more data from it. Perhaps wraps it and add some debug for reads. Since you have the dump open @rzikm, can you check that we have space e.g. _buffer.AvailableMemory is not empty?
// We either don't have full frame or we don't have enough data to even determine the size.
int bytesRead = await TIOAdapter.ReadAsync(InnerStream, _buffer.AvailableMemory, cancellationToken).ConfigureAwait(false);
Correct, should be a NetworkStream. My first thought @wfurt was perhaps this implementation somehow suffers from that zero byte read change in
net8.0, but this is a fairly basic scenario...that being said, I could supply source if needed. I'd prefer not to publicly post here, however.
You can reach me at my work mail radekzikmund at microsoft.com, we can continue the conversation privately over there.
@rzikm - sent an email your way.
I think I see the problem.
var mimeType = _streamIn?.ReadLine();
var length = Convert.ToInt32(_streamIn?.ReadLine());
var buffer = new byte[length];
var totalBytesRead = 0;
var timeout = DateTime.Now.AddMilliseconds(NET_READ_TIMEOUT);
while (totalBytesRead < length)
{
var bytesRead = _sslStream.Read(buffer, totalBytesRead, length - totalBytesRead);
When you are using StreamReader, you should not be reading from the underlying Stream on your own, because of StreamReaders internal buffering. By reading straight from the inner stream, you may be skipping any buffered data left in the StreamReaders internal buffer. So the file you downloaded gets truncated from the front. This does not happen deterministically because it is dependent on the timing between arrival of the data over the network.
The issue surfaces in .NET 8 probably because of some internal change in SslStream where we started processing multiple incoming TLS frames and return more data to the user per Read() call.
Since the textual parts of the protocol used when communicating with the server are delimited by newline characters, you should be able to replace the use of StreamReader by something like
private string ReadResponseLine()
{
StringBuilder sb = new StringBuilder();
while (_sslStream.ReadByte() is int c && c != '\n' && c != -1)
{
sb.Append((char)c);
}
return sb.ToString();
}
There is similar problem with using StreamWriter and writing to the underlying Stream as well, but as long as you call StreamWriter.Flush() before touching the underlying Stream there should not be a problem. Unfortunately, there is no way to "unread" the data of the inner Stream of StreamReader.
Ah, it seems we may have come to the same conclusion. After further looking it over last night, I was suspicious of the previous reads from the StreamReader, but that makes sense that the changes in net8.0 may have exposed the problem in a non-deterministic way. I'll test out today and report back my findings.
I can confirm now that removing the StreamReader bit is the fix. @rzikm and @wfurt, thank you very much for all your help in diagnosing this issue. We can consider this one resolved!