pnpframework
pnpframework copied to clipboard
[BUG] SSL connection cannot be established at random intervals
When I am running a lot of provisioning templates (in my case ~400) I get the error:
195 | Invoke-PnPSiteTemplate -Path ".[my Path] verfolgen … | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | The SSL connection could not be established, see inner exception.
That occours randomly every 2 to 40 executions. If I am going to execute the same script again, same will happen with different templates. So rerunnig the failded ones will narrow down after some executions, but looking for each one that has falied is kind of annoying.
@atomroflman : if you look at the inner exception can you then identify a common api call that throws this error or is it more at random?
@jansenbe I have seen the same thing on a few occasions the last few weeks at least. This has occurred when running provisioning of large templates using just PnP Framework, so no PnP PowerShell and I always thought it was just a temporary thing but after reading this I did some testing.
tl;dr
There seems to be some issues with how requests are handled server side when they probably should be throttled instead. The feeling is either you get the "SSL connection cannot be established" quite soon or you get "(400) Bad Request" later on. That is also the feeling I have from our production solutions where I have seen this. Question is if this is something that should be fixed server side or if this is something that must be handled in PnP Framework?
Detailed test examples
I made a simple console app that just loops and reads the title from the connected web:
var authManager = new AuthenticationManager(clientId, certificate, tenantId);
var clientContext = await authManager.GetContextAsync(siteUrl);
for (int i = 0; i < 10000; i++)
{
clientContext.Load(clientContext.Web, w => w.Title);
await clientContext.ExecuteQueryRetryAsync();
Console.WriteLine($"{i} - {clientContext.Web.Title}");
}
Every time I ran this it got to request number 1100 - 1400 and then the exception "SSL connection cannot be established" was cast. The inner exception is "An existing connection was forcibly closed by the remote host." I have tried it with different tenants with the same result, each and every time I ran it I got the exception somewhere between request 1100 and 1400. Sometimes even sooner.
I have also tested with just using the default ExecuteQueryAsync with the same result.
It seems like it could have something to do with throttling and it seems very similar to SharePoint/sp-dev-docs#6955 which I have also seen within the last week or so.
If I then modify my test app to use the new HttpClientWebRequestExecutorFactory from PnP Framework I cannot recreate it, it runs through all 10000 requests without any issues. I have run it over and over without seeing the exception. This does not however seem to help for the much rarer occurencies of the 400 Bad Request errors from the issue linked above.
var authManager = new AuthenticationManager(clientId, certificate, tenantId);
var clientContext = await authManager.GetContextAsync(siteUrl);
clientContext.WebRequestExecutorFactory = new HttpClientWebRequestExecutorFactory(PnPHttpClient.Instance.GetHttpClient());
for (int i = 0; i < 10000; i++)
{
clientContext.Load(clientContext.Web, w => w.Title);
await clientContext.ExecuteQueryRetryAsync();
Console.WriteLine($"{i} - {clientContext.Web.Title}");
}
I then also did the same with a Net Framework Console App using PnP Sites Core and SharePointOnlineCredentials and with this it took 6329 requests and then I got "The remote server returned an error: (400) Bad Request."
using (var clientContext = new ClientContext(siteUrl))
{
clientContext.Credentials = new SharePointOnlineCredentials(userName, password);
for (int i = 0; i < 10000; i++)
{
clientContext.Load(clientContext.Web, w => w.Title);
await clientContext.ExecuteQueryRetryAsync();
Console.WriteLine($"{i} - {clientContext.Web.Title}");
}
}
I can provide some sample Error from our Logs - here it's when creating folder, but it happens at different functions within PnP Apply - for example while CreateView for Lists or add the Lookup-Fields to Lists. Therefore i think it's not related to the PnP Action. My first intention was to handle it just same way as 429 but it's somewhat difficult to know if my context is still valid.
{
"$type": "System.IO.IOException, System.Private.CoreLib",
"ClassName": "System.IO.IOException",
"Message": "Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host..",
"Data": {
"$type": "System.Collections.ListDictionaryInternal, System.Private.CoreLib"
},
"InnerException": {
"$type": "System.Net.Sockets.SocketException, System.Net.Primitives",
"ClassName": "System.Net.Sockets.SocketException",
"Message": "An existing connection was forcibly closed by the remote host.",
"Data": null,
"InnerException": null,
"HelpURL": null,
"StackTraceString": " at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)",
"RemoteStackTraceString": null,
"RemoteStackIndex": 0,
"ExceptionMethod": null,
"HResult": -2147467259,
"Source": "System.Net.Sockets",
"WatsonBuckets": null,
"NativeErrorCode": 10054
},
"HelpURL": null,
"StackTraceString": " at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)\r\n at System.Net.Security.SslStream.FillBufferAsync[TReadAdapter](TReadAdapter adapter, Int32 minSize)\r\n at System.Net.Security.SslStream.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)\r\n at System.Net.Security.SslStream.Read(Byte[] buffer, Int32 offset, Int32 count)\r\n at System.Net.Http.HttpConnection.Fill()\r\n at System.Net.Http.HttpConnection.ChunkedEncodingReadStream.Read(Span`1 buffer)\r\n at System.Net.Http.HttpBaseStream.Read(Byte[] buffer, Int32 offset, Int32 count)\r\n at System.IO.Compression.DeflateStream.ReadCore(Span`1 buffer)\r\n at System.IO.Compression.DeflateStream.Read(Byte[] array, Int32 offset, Int32 count)\r\n at System.IO.StreamReader.ReadBuffer(Span`1 userBuffer, Boolean& readToUserBuffer)\r\n at System.IO.StreamReader.ReadSpan(Span`1 buffer)\r\n at System.IO.StreamReader.Read(Char[] buffer, Int32 index, Int32 count)\r\n at Microsoft.SharePoint.Client.TextPeekReader.FillBuffer()\r\n at Microsoft.SharePoint.Client.TextPeekReader.Peek(Int32 position)\r\n at Microsoft.SharePoint.Client.JsonReader.SkipWhiteSpace()\r\n at Microsoft.SharePoint.Client.JsonReader..ctor(TextReader reader, ClientRuntimeContext runtimeContext)\r\n at Microsoft.SharePoint.Client.ClientRequest.ProcessResponseStream(Stream responseStream)\r\n at Microsoft.SharePoint.Client.ClientRequest.ProcessResponse()\r\n at Microsoft.SharePoint.Client.ClientRequest.ExecuteQueryToServerAsync(ChunkStringBuilder sb)\r\n at Microsoft.SharePoint.Client.ClientRequest.ExecuteQueryAsync()\r\n at Microsoft.SharePoint.Client.ClientRuntimeContext.ExecuteQueryAsync()\r\n at Microsoft.SharePoint.Client.ClientContext.ExecuteQueryAsync()\r\n at Microsoft.SharePoint.Client.ClientContextExtensions.ExecuteQueryImplementation(ClientRuntimeContext clientContext, Int32 retryCount, String userAgent)\r\n at Microsoft.SharePoint.Client.ClientContextExtensions.ExecuteQueryRetry(ClientRuntimeContext clientContext, Int32 retryCount, String userAgent)\r\n at PnP.Framework.Provisioning.ObjectHandlers.ObjectListInstance.CreateFolderInList(ListInfo list, Folder parentFolder, Folder folder, TokenParser parser, PnPMonitoredScope scope)\r\n at PnP.Framework.Provisioning.ObjectHandlers.ObjectListInstance.CreateFolderInList(ListInfo list, Folder parentFolder, Folder folder, TokenParser parser, PnPMonitoredScope scope)\r\n at PnP.Framework.Provisioning.ObjectHandlers.ObjectListInstance.CreateFolderInList(ListInfo list, Folder parentFolder, Folder folder, TokenParser parser, PnPMonitoredScope scope)\r\n at PnP.Framework.Provisioning.ObjectHandlers.ObjectListInstance.ProcessFolders(Web web, TokenParser parser, PnPMonitoredScope scope, ListInfo list)\r\n at PnP.Framework.Provisioning.ObjectHandlers.ObjectListInstance.ProvisionObjects(Web web, ProvisioningTemplate template, TokenParser parser, ProvisioningTemplateApplyingInformation applyingInformation)\r\n at PnP.Framework.Provisioning.ObjectHandlers.SiteToTemplateConversion.ApplyRemoteTemplate(Web web, ProvisioningTemplate template, ProvisioningTemplateApplyingInformation provisioningInfo, Boolean calledFromHierarchy, TokenParser tokenParser)\r\n at Microsoft.SharePoint.Client.WebExtensions.ApplyProvisioningTemplate(Web web, ProvisioningTemplate template, ProvisioningTemplateApplyingInformation applyingInformation)",
"RemoteStackTraceString": null,
"RemoteStackIndex": 0,
"ExceptionMethod": null,
"HResult": -2146232800,
"Source": "System.Net.Sockets",
"WatsonBuckets": null
}
The log is from Code running in a Azure Function in same data center then the SP-Farm it connects to.
@atomroflman / @patrikhellgren / @czullu : anyone have a recent correlationid and a tenant name for a failed request, with that I can pull the ULS logs and hopefully get a better understanding of what causes the error.
@jansenbe I just reproduced the exception once again and this time it occurred after just 8 requests using my test console app from my previous comment. Unfortunately since it is the remote host closing the connection there is no correlation Id present in the exception, at least I cannot find one. I am testing this against a dev tenant and I am the only one using it so there shouldn't be that many requests to check if you wanted to go through the ULS. The exception occurred sometime between 6:05pm and 6:15pm in the tenant kebnesherpas3.sharepoint.com. I will try to reproduce the 400 Bad Request exception again since I believe that one contains a correlation Id. I will comment again with that info.
@jansenbe I figured I would log the ClientContext.TraceCorrelationId together with each request and exception to see if that might help. When reproducing the SSL Connection exception again the TraceCorrelationId was f9b0c39f-80b4-2000-d2fd-548d574a97b2 but that was the same Id as the request before so it had not been updated but if you find that one in the logs the error should be close after.
I also managed to reproduce the 400 Bad Request error and there I could get to the response headers and the Id and time headers were:
SPRequestGuid:35b1c39f-c00f-2000-bbe5-7399bb563ac9
request-id:35b1c39f-c00f-2000-bbe5-7399bb563ac9
Date:Sat, 01 May 2021 17:07:42 GMT
The ClientContext.TraceCorrelationId after this request was 35b1c39f-0002-2000-bd30-327ccda73dd4 but again this was probably after the last successful request.
i changed #301 in order to retry on SocketError. My Test with loop like above but counting up a list entry in order to see if in any chance the update was written but just response failed went successfull 5 times up to 1000.
I see bad request responses (http 400) from CSOM calls as reported here (https://github.com/SharePoint/sp-dev-docs/issues/6955).
E.g. this: SPRequestGuid: 5f6e44cc-239d-4254-a80c-a5c95c5d06b9 Date: Mon, 03 May 2021 01:17:08 GMT
@patrikhellgren / @czullu : the provided correlation id's do not show any error, so my suspicion is that this goes wrong either in code running before/after the requests have been processed by SharePoint (like throttling code, networking stack etc).
Would recommend creating the support cases for this so that there's an official channel. I'll see from my side what I can do as well. Thanks for reporting this and providing the repro steps.
Internal mail sent... Please keep adding feedback to this thread.
@EinmalIM : can you let me know your tenant name. (send to [email protected] if you can't share it openly here)...need to tenant name to lookup the log via correlation id
This is my tenant for the above request guid: sflx.sharepoint.com
We see the bad request results in all tenants we or our customers work in. I could provide details from two more tenants of my company, if that helps.
@jansenbe The first support request I opened was rejected since it was opened through a dev tenant so I tried again from a paid tenant so we'll see what they say this time. The new case # is 25474566 if you need it.
@jansenbe any news about this issue? Did the requestId from one of our tenants help?
Looking at logs at our end i come to the conclusion that we need a shared Backoff-/Retry-state since there can be for example 6 ApplyTemplate toward the same tenant using same Azure App (as they run for example in same Azure Function). From Ms throttling perspective it will look like the 429 is not respected since one thread does not know that the other thread got a 429 and it should backoff as well. I will start looking at a possible way to implement this in a generic way. Reason: i see usually a number of 429 before i get the socket error - how ever it does not really explain the behavior with the loop from patrik.
Current Idea is a interface in PnPContext with in-memory default implementation but allow to use Azure Table or redis-cache (little bit like the token cache).
In our case we have some automated provisioning tasks run at night, in this case 3:00 AM, which rules out parallel user actions. Looking only at our provisioning logs, I see no 429 before the 400 responses. Part of the provisioning stuff sets up remote event receivers, so we have SharePoint starting parallel tasks in our backend, due to added or updated items in SharePoint lists. While that might sum up and might cause more 429 errors with higher retry-after headers, the 400 bad request response seems to happen independant from 429 in our systems.
I'm working getting the needed internal attention, but the more of you (or your customers) create support cases the more visibility the issue will get. So please create use the official channels (like Premier support) to create an issue for this.
We have two more people, seeing the http 400 error in this thread: https://github.com/SharePoint/sp-dev-docs/issues/6955
We've got the issue on multiple customer tenants and created support cases. We have the feeling Support is not really treating it as something they know is failing and we're still trying to convince them there might be an issue. Really hoping it's getting noticed somewhere. We're running Azure Functions wit the PnP Privisioning Framework for multiple customers and they have been failing now for 3 weeks or something.
Hi @PeterHeibrink , I might be facing same - are you using C# .Net Core functions with PnP?
Hi Anthony, we're not using .NET Core, but the full blown old school version currently. But might still be the same issue. For us it looks like we're no longer receiving 429 throttle messages but receive a 400 error in stead. The errors are not always thrown on applying a PnP template, but for larger templates we get the error ALWAYS. Our only way around would be to create our own PnP version and just to handle a 400 error like what's currently done for a 429, but we prefer to keep in line with the community version.
Microsoft has fixed the issue, that SharePoint sometimes returns an http 400, on May 7th as reported here: https://github.com/SharePoint/sp-dev-docs/issues/6955
We could not find any http 400 failures in our logs after 5/7, so the fix seems to work in our case.
I am still seeing this issue with "SSL connection cannot be established" and can reproduce it repeatedly. I am also working with MS support on this. The 400 Bad Request however seems to be fixed as I cannot reproduce that anymore.
The SSL issue is most likely related to the deprecation of TLS 1.0 and 1.1: https://docs.microsoft.com/en-us/microsoft-365/compliance/tls-1.0-and-1.1-deprecation-for-office-365?view=o365-worldwide
Are you running with .NET 4.5? If so, there some manual steps that need to be taken on the machine: https://docs.microsoft.com/en-us/mem/configmgr/core/plan-design/security/enable-tls-1-2-client#configure-for-strong-cryptography
If the machine is win7/win8 it might also need: https://support.microsoft.com/en-us/topic/update-to-enable-tls-1-1-and-tls-1-2-as-default-secure-protocols-in-winhttp-in-windows-c4bd73d2-31d7-761e-0178-11268bb10392
@ificator If you see my comment above you will see that I can reproduce the issue using a simple .Net Core 3.1 Console App using modern authentication and before seeing the error it succeeds with thousands of calls. I can not however reproduce it using a .Net Framework 4.6.1 Console App using SharePointOnlineCredentials doing 10000 calls. All this running on Windows 10. In other words I think we can rule out TLS versions.
Sorry @patrikhellgren, I didn't read this entire thread - just jumped in to see if there was confirmation that random 400s were resolved and saw your mention of SSL.
From the comment you referenced it sound like a lot of calls do succeed, and then you hit one that fails due to the SSL error? That would definitely suggest it's not something as general as the client not supporting TLS 1.2 as every request would fail in that case.
@ificator No problem, just glad someone is looking into these issues. Just to further rule out any TLS issues, here is a fiddler trace of the 8000th (give or take a few) request in a loop where I today reproduced this issue and as you can see it is TLS 1.2 and that the connection was forcibly closed by the remote host which then causes the "SSL connection cannot be established" exception in the client application. Also sent the whole trace to MS support.
Same issue here, .NET 5.0.
{"The SSL connection could not be established, see inner exception."}
{" Received an unexpected EOF or 0 bytes from the transport stream."}
" at System.Net.Security.SslStream.<<FillHandshakeBufferAsync>g__InternalFillHandshakeBufferAsync|182_0>d1.MoveNext()\r\n at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n at System.Net.Security.SslStream.<ReceiveBlobAsync>d__172
1.MoveNext()\r\n at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n at System.Net.Security.SslStream.<ForceAuthenticationAsync>d__171`1.MoveNext()\r\n at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n at System.Net.Http.ConnectHelper.<EstablishSslConnectionAsyncCore>d__4.MoveNext()"
@trochjohn / @patrikhellgren : do you still see these SSL errors?
@jansenbe I just tested again and I am still seeing the "An existing connection was forcibly closed by the remote host" errors. Unfortunately my last support case regarding this was also closed with the reason that they could not help with developer issues (!) even though I clearly told them that this was not a developer issue and that it affected several others as well. I was told I could try to open a payed developer case instead so I guess I will try that also but I don't have to much hope.