letsencrypt-siteextension icon indicating copy to clipboard operation
letsencrypt-siteextension copied to clipboard

Access exception upon certificate renewal attempt

Open InteXX opened this issue 5 years ago • 20 comments

I'm getting an error when the webjob attempts to renew a certificate:

Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.Cleanup ---> Microsoft.Rest.Azure.CloudException: The client '[Redacted]' with object id '[Redacted]' does not have authorization to perform action 'Microsoft.Web/sites/config/list/action' over scope '/subscriptions/[Redacted]/resourceGroups/[Redacted]/providers/Microsoft.Web/sites/[Redacted]/config/publishingcredentials'.

The full stack report is below.

I've reviewed documentation here and here, but I'm afraid I'm still at a loss.

I've found the Microsoft.Web/sites/config/list/action provider here, but it's not listed in the available roles and there's no indication as to how to give it access to this:

/subscriptions/[Redacted]/resourceGroups/[Redacted]/providers/Microsoft.Web/sites/[Redacted]/config/publishingcredentials

All has been working well for the past year, but it only started failing within the past month or so. I have two websites on which I'm running the job, and suddenly both are failing with like errors. I've changed nothing in my Azure configuration.

 1   {
 2     "Type": "FunctionCompleted",
 3     "EndTime": "2019-03-21T03:11:53.1829332+00:00",
 4     "Failure": {
 5       "ExceptionType": "Microsoft.Azure.WebJobs.Host.FunctionInvocationException",
 6       "ExceptionDetails": "Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.Cleanup ---> Microsoft.Rest.Azure.CloudException: The client '[Redacted]' with object id '[Redacted]' does not have authorization to perform action 'Microsoft.Web/sites/config/list/action' over scope '/subscriptions/[Redacted]/resourceGroups/[Redacted]/providers/Microsoft.Web/sites/[Redacted]/config/publishingcredentials'.
 7      at Microsoft.Azure.Management.WebSites.WebAppsOperations.<BeginListPublishingCredentialsWithHttpMessagesAsync>d__210.MoveNext()
 8   --- End of stack trace from previous location where exception was thrown ---
 9      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
10      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
11      at Microsoft.Azure.Management.WebSites.WebAppsOperationsExtensions.<BeginListPublishingCredentialsAsync>d__411.MoveNext()
12   --- End of stack trace from previous location where exception was thrown ---
13      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
14      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
15      at Microsoft.Azure.Management.WebSites.WebAppsOperationsExtensions.BeginListPublishingCredentials(IWebAppsOperations operations, String resourceGroupName, String name)
16      at LetsEncrypt.Azure.Core.KuduHelper.GetKuduClient(WebSiteManagementClient client, IAzureWebAppEnvironment settings) in D:\\a\\1\\s\\LetsEncrypt.SiteExtension.Core\\KuduHelper.cs:line 15
17      at LetsEncrypt.Azure.Core.Services.KuduFileSystemAuthorizationChallengeProvider..ctor(IAzureWebAppEnvironment azureEnvironment, IAuthorizationChallengeProviderConfig config) in D:\\a\\1\\s\\LetsEncrypt.SiteExtension.Core\\Services\\KuduFileSystemAuthorizationChallengeProvider.cs:line 22
18      at LetsEncrypt.Azure.Core.CertificateManager..ctor(AppSettingsAuthConfig config) in D:\\a\\1\\s\\LetsEncrypt.SiteExtension.Core\\CertificateManager.cs:line 31
19      at LetsEncrypt.SiteExtension.Functions.Cleanup(TimerInfo timerInfo) in D:\\a\\1\\s\\LetsEncrypt.SiteExtension.WebJob\\Functions.cs:line 73
20      at lambda_method(Closure , Functions , Object[] )
21      at Microsoft.Azure.WebJobs.Host.Executors.VoidMethodInvoker`1.InvokeAsync(TReflected instance, Object[] arguments)
22      at Microsoft.Azure.WebJobs.Host.Executors.FunctionInvoker`1.<InvokeAsync>d__8.MoveNext()
23   --- End of stack trace from previous location where exception was thrown ---
24      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
25      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
26      at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<InvokeAsync>d__22.MoveNext()
27   --- End of stack trace from previous location where exception was thrown ---
28      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
29      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
30      at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
31      at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithWatchersAsync>d__21.MoveNext()
32   --- End of stack trace from previous location where exception was thrown ---
33      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
34      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
35      at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithLoggingAsync>d__19.MoveNext()
36   --- End of stack trace from previous location where exception was thrown ---
37      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
38      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
39      at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
40      at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithLoggingAsync>d__13.MoveNext()
41      --- End of inner exception stack trace ---
42      at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
43      at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithLoggingAsync>d__13.MoveNext()
44   --- End of stack trace from previous location where exception was thrown ---
45      at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
46      at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
47      at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<TryExecuteAsync>d__10.MoveNext()"
48     },
49     "ParameterLogs": {},
50     "FunctionInstanceId": "[Redacted]",
51     "Function": {
52       "Id": "LetsEncrypt.SiteExtension.Functions.Cleanup",
53       "FullName": "LetsEncrypt.SiteExtension.Functions.Cleanup",
54       "ShortName": "Functions.Cleanup",
55       "Parameters": [
56         {
57           "Name": "timerInfo",
58           "DisplayHints": {
59             "Description": "Timer executed on schedule (Daily: 1 occurrences)"
60           }
61         }
62       ]
63     },
64     "Arguments": {
65       "timerInfo": "2019-03-21T03:11:49.9071967+00:00"
66     },
67     "Reason": "AutomaticTrigger",
68     "ReasonDetails": "Timer fired at 2019-03-21T03:11:48.8550732+00:00",
69     "StartTime": "2019-03-21T03:11:48.8550732+00:00",
70     "OutputBlob": {
71       "ContainerName": "azure-webjobs-hosts",
72       "BlobName": "output-logs/[Redacted].txt"
73     },
74     "ParameterLogBlob": {
75       "ContainerName": "azure-webjobs-hosts",
76       "BlobName": "output-logs/[Redacted].params.txt"
77     },
78     "HostInstanceId": "[Redacted]",
79     "HostDisplayName": "LetsEncrypt.SiteExtension.WebJob",
80     "SharedQueueName": "azure-webjobs-host-le-[Redacted]",
81     "InstanceQueueName": "azure-webjobs-host-[Redacted]",
82     "Heartbeat": {
83       "SharedContainerName": "azure-webjobs-hosts",
84       "SharedDirectoryName": "heartbeats/le-[Redacted]",
85       "InstanceBlobName": "[Redacted]",
86       "ExpirationInSeconds": 45
87     },
88     "WebJobRunIdentifier": {
89       "WebSiteName": "[Redacted]",
90       "JobType": "Continuous",
91       "JobName": "letsencrypt.siteextension.job",
92       "RunId": ""
93     }
94   }

InteXX avatar Mar 22 '19 01:03 InteXX

If you generate the client secret for the service principal from the portal should be aware that the default life-time is one year. So maybe the secret is simply expired. You can just lookup the service principal in azure ad (using the client id, if you forgot what you named it), and generate a new secret. This time set the life-time to non-expiring, then you wont have this problem later again.

sjkp avatar Mar 27 '19 19:03 sjkp

Yes, I ran into that a while back. Since then I've always generated them as non-expiring:

image

@Tsaukpaetra seems to feel that the Service Principal no longer has the role required to access that resource group, but I'm struggling to figure out how to check on that (official documentation is frustrating, to say the least).

Would you concur?

InteXX avatar Mar 27 '19 19:03 InteXX

You can just lookup the service principal in azure ad (using the client id, if you forgot what you named it)

How does one do this?

InteXX avatar Mar 27 '19 19:03 InteXX

Oh I can see from the other thread that you don't know how to check if the service principal certificate really is expired - you can use these powershell scripts to get the info

Connect-AzureAD -TenantId "yourtenantid"

$a = Get-AzureADApplication -All $true -Filter "AppId eq 'your-client-id'"
$a.PasswordCredentials

Then you will see something similar to this image If all the end dates are in the past, then you need to create a new password. You can't get this information from the portal UI AFAIK.

sjkp avatar Mar 27 '19 19:03 sjkp

If all the end dates are in the past

Hm, it must be something else:

image

InteXX avatar Mar 27 '19 19:03 InteXX

Check that the service principal still have access to the resource group?

sjkp avatar Mar 27 '19 22:03 sjkp

Yes, that's what I'm trying to figure out how to do ;-)

Anyway... I opened a support ticket. Expensive, I know, but this is a must-have. Hopefully I'll be receiving a phone call shortly.

InteXX avatar Mar 27 '19 22:03 InteXX

image Go to your resource group with the web site, and Click Acces Control (IAM) and use the Chekc Access Feature. If you know the name of your Service Principal/Application you can just search for it and see what persionssion it have assigned. In my case here it is granted access on the subscription, but that is more than required, contributor on the resource group should be sufficient.

sjkp avatar Mar 27 '19 22:03 sjkp

Well shucky darn, that seems to have been it.

I'd come upon this same page in all my searching, but I didn't realize you could search for an application by name. It didn't appear in the pick list, so I figured I was at the wrong screen.

So I added lews to the Contributor role. I'll wait for the job to run again, and we'll see what happens.

Thanks for the screen shot and the tip!

InteXX avatar Mar 27 '19 23:03 InteXX

NP

sjkp avatar Mar 27 '19 23:03 sjkp

It's set to run again in just over six hours. I'd run it manually, but I want to wait to see what it does under schedule.

I'll report back here. Keep the issue open?

InteXX avatar Mar 27 '19 23:03 InteXX

Just close it when you have validated that it works :)

sjkp avatar Mar 27 '19 23:03 sjkp

Will do.

InteXX avatar Mar 27 '19 23:03 InteXX

OK, we have a new cert:

image

...but alas we only have partial success. Stack trace below.

This is a different error. Is it an Azure issue or a LEWS issue?

Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.Cleanup ---> Microsoft.Rest.Azure.CloudException: Operation returned an invalid status code 'Conflict'
   at Microsoft.Azure.Management.WebSites.CertificatesOperations.<DeleteWithHttpMessagesAsync>d__9.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Management.WebSites.CertificatesOperationsExtensions.<DeleteAsync>d__9.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Management.WebSites.CertificatesOperationsExtensions.Delete(ICertificatesOperations operations, String resourceGroupName, String name)
   at LetsEncrypt.Azure.Core.Services.WebAppCertificateService.RemoveCertificate(WebSiteManagementClient webSiteClient, Certificate s) in D:\a\1\s\LetsEncrypt.SiteExtension.Core\Services\WebAppCertificateService.cs:line 104
   at LetsEncrypt.Azure.Core.Services.WebAppCertificateService.<>c__DisplayClass4_1.<RemoveExpired>b__1(Certificate s) in D:\a\1\s\LetsEncrypt.SiteExtension.Core\Services\WebAppCertificateService.cs:line 96
   at System.Collections.Generic.List`1.ForEach(Action`1 action)
   at LetsEncrypt.Azure.Core.Services.WebAppCertificateService.RemoveExpired(Int32 removeXNumberOfDaysBeforeExpiration) in D:\a\1\s\LetsEncrypt.SiteExtension.Core\Services\WebAppCertificateService.cs:line 96
   at LetsEncrypt.SiteExtension.Functions.Cleanup(TimerInfo timerInfo) in D:\a\1\s\LetsEncrypt.SiteExtension.WebJob\Functions.cs:line 73
   at lambda_method(Closure , Functions , Object[] )
   at Microsoft.Azure.WebJobs.Host.Executors.VoidMethodInvoker`1.InvokeAsync(TReflected instance, Object[] arguments)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionInvoker`1.<InvokeAsync>d__8.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<InvokeAsync>d__22.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithWatchersAsync>d__21.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithLoggingAsync>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithLoggingAsync>d__13.MoveNext()
   --- End of inner exception stack trace ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithLoggingAsync>d__13.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<TryExecuteAsync>d__10.MoveNext()

InteXX avatar Mar 28 '19 04:03 InteXX

Looks like it was trying to delete the old certs but failed? I assume this may have happened because the cert was still associated with a site.

Tsaukpaetra avatar Mar 28 '19 04:03 Tsaukpaetra

Maybe so... I replayed it and it succeeded this time.

I also noticed that Cleanup() ran before RenewCertificate(). What do you make of that?

InteXX avatar Mar 28 '19 04:03 InteXX

https://github.com/sjkp/letsencrypt-siteextension/blob/9660b42c5fc4b1f8fa04bde6e01d64a8d0b03f2f/LetsEncrypt.SiteExtension.Core/Services/WebAppCertificateService.cs#L102-L105

InteXX avatar Mar 28 '19 04:03 InteXX

I think in theory it should have skipped any certs that were still in use (see link 92 above your reference) but somehow it found a letsencrypt cert that was associated but it didn't know about. 🤷‍♂️ I assume the resource group had certs by letsencrypt belonging to another site that wasn't the one the configuration was pointing to.

Tsaukpaetra avatar Mar 28 '19 04:03 Tsaukpaetra

That makes sense... that's a little bit of why I'm leaving the issue open for the time being. Simon may want to have a look at that part of it.

It'd be hard to reproduce this one, I think.

InteXX avatar Mar 28 '19 06:03 InteXX

Yeah, it'll take three months now to repro (since I assume you didn't export the expired cert). Suggested repro steps:

  1. Have two sites install the letsencrypt extension and get certs for their sites, and both store the certs in the same resource group.
  2. Remove the letsencrypt extension (or otherwise disable the webjob) so that its cert expires or is otherwise not auto-renewed.
  3. The remaining letsencrypt webjob should notice that there's a letsencrypt cert that's expired and try to delete it, but will fail because it hasn't removed/replaced it on the site (because it doesn't know about the site).

In this case, I thought there was some standalone multi-site thing that was added, but I never got around to setting it up. That would probably be a better solution in the long run if multiple sites need to be certed.

Tsaukpaetra avatar Mar 28 '19 07:03 Tsaukpaetra