Azure-Functions icon indicating copy to clipboard operation
Azure-Functions copied to clipboard

Host ID Collision Error

Open mathewc opened this issue 4 years ago • 6 comments

Motivation

This change is meant to address longstanding issue https://github.com/Azure/azure-functions-host/issues/2015. The underlying issue is caused by the following things being true:

  1. The computed host id is long enough that it requires truncation (app name > 32 characters in length).
  2. The Function app is sharing a storage account with another Function app whose truncated host id is the same.
  3. The Function app is using one or more trigger types that store state under HostID in the storage account

If these are all true, then there is a collision and you can end up with unexpected outcomes as the two apps share state.

When this issue is detected for an app a log will be written. In v4 this will be an Error. In v3 this will be a Warning:

"A collision for Host ID '{hostid}' was detected in the configured storage account. For more information, see https://aka.ms/functions-hostid-collision."

Impact

At the point in time when this issue was create, about 28% of v3 apps have a name subject to truncation. Of those, only apps that are sharing storage accounts would be affected, which is likely a much lower percentage.

Compat-mode support

There will be an app setting FUNCTIONS_HOSTID_CHECK_LEVEL that supports values Warning/Error. In v3 the default will be Warning, and in v4 it will be Error. By default when this issue is detected in v4 it will be a hard failure. As an escape hatch this app setting can be used to change back to warning if needed.

Alternatives

Yes we discussed alternatives, but none of those were acceptable.

Detection

The v3 warning can be used to detect this.

Support

The error will have an aka.ms link with documentation on the issue and the recommended course of action for affected customers to take.

Documentation

Documentation impact

Components impacted

Functions Host

Performance

No.

mathewc avatar Sep 23 '21 18:09 mathewc

In the upgrade docs here it says you should "Use a separate storage account for each function app" and then links to this GitHub Issue. But in the docs here it says, "It's possible for multiple function apps to share the same storage account without any issues". It does go on to say that "... this approach isn't always a good idea in a production environment", but it doesn't say not to do it. Can this be clarified in the docs?

ijabit avatar Jan 06 '22 14:01 ijabit

Its okay from the point of functionality as function apps will not step over each other if multiple apps sharing the same storage account in general (except for the issue mentioned above). But it is not recommended due concerns around best practices like availability (single point of failure), security (better isolation/access controls when each app has its own secret)

balag0 avatar Jan 07 '22 04:01 balag0

We ran into problems with this, using Functions v4, dotnet-isolated; however, the only symptom was that the job host constantly restarted, there were no errors nor warnings logged anywhere (while the constant stop-start was logged properly) (this was true for multiple functions, separately). The root cause was indeed the too long name, the issue was actually caused by the function and its staging slot. Actually not both the functions were restarting, only one, sometimes it was the staging slot but sometimes the live one.

The workaround we used is a slot-level config override for AzureFunctionsWebHost__hostId.

Could you please check how is it possible that there were no errors logged at all?

That's all the logs we see:

2022-04-26T12:30:59.278 [Information] Host lock lease acquired by instance ID 'eb635ef47d9374f47ee1dde305a3c9fd'.
2022-04-26T12:31:21.059 [Information] FUNCTIONS_WORKER_RUNTIME set to dotnet-isolated. Skipping WorkerConfig for language:java
2022-04-26T12:31:21.063 [Information] FUNCTIONS_WORKER_RUNTIME set to dotnet-isolated. Skipping WorkerConfig for language:node
2022-04-26T12:31:21.063 [Information] FUNCTIONS_WORKER_RUNTIME set to dotnet-isolated. Skipping WorkerConfig for language:powershell
2022-04-26T12:31:21.103 [Warning] The Dashboard setting is no longer supported. See https://aka.ms/functions-dashboard for details.
2022-04-26T12:31:21.190 [Information] Initializing Warmup Extension.
2022-04-26T12:31:21.236 [Information] Initializing Host. OperationId: '1c31c0b9-8022-4828-8d8a-404dea961f31'.
2022-04-26T12:31:21.240 [Information] Host initialization: ConsecutiveErrors=0, StartupCount=1, OperationId=1c31c0b9-8022-4828-8d8a-404dea961f31
2022-04-26T12:31:21.279 [Information] ApplicationInsightsLoggerOptions{"SamplingSettings": {"EvaluationInterval": "00:00:15","InitialSamplingPercentage": 100.0,"MaxSamplingPercentage": 100.0,"MaxTelemetryItemsPerSecond": 20.0,"MinSamplingPercentage": 0.1,"MovingAverageRatio": 0.25,"SamplingPercentageDecreaseTimeout": "00:02:00","SamplingPercentageIncreaseTimeout": "00:15:00"},"SamplingExcludedTypes": "Request","SamplingIncludedTypes": null,"SnapshotConfiguration": null,"EnablePerformanceCountersCollection": true,"HttpAutoCollectionOptions": {"EnableHttpTriggerExtendedInfoCollection": true,"EnableW3CDistributedTracing": true,"EnableResponseHeaderInjection": true},"LiveMetricsInitializationDelay": "00:00:15","EnableLiveMetrics": true,"EnableDependencyTracking": true,"DependencyTrackingOptions": null}
2022-04-26T12:31:21.280 [Information] LoggerFilterOptions{"MinLevel": "None","Rules": [{"ProviderName": null,"CategoryName": null,"LogLevel": null,"Filter": "<AddFilter>b__0"},{"ProviderName": "Microsoft.Azure.WebJobs.Script.WebHost.Diagnostics.SystemLoggerProvider","CategoryName": null,"LogLevel": "None","Filter": null},{"ProviderName": "Microsoft.Azure.WebJobs.Script.WebHost.Diagnostics.SystemLoggerProvider","CategoryName": null,"LogLevel": null,"Filter": "<AddFilter>b__0"},{"ProviderName": "Microsoft.Azure.WebJobs.Logging.ApplicationInsights.ApplicationInsightsLoggerProvider","CategoryName": null,"LogLevel": "Trace","Filter": null}]}
2022-04-26T12:31:21.280 [Information] LoggerFilterOptions{"MinLevel": "None","Rules": [{"ProviderName": null,"CategoryName": null,"LogLevel": null,"Filter": "<AddFilter>b__0"},{"ProviderName": "Microsoft.Azure.WebJobs.Script.WebHost.Diagnostics.SystemLoggerProvider","CategoryName": null,"LogLevel": "None","Filter": null},{"ProviderName": "Microsoft.Azure.WebJobs.Script.WebHost.Diagnostics.SystemLoggerProvider","CategoryName": null,"LogLevel": null,"Filter": "<AddFilter>b__0"},{"ProviderName": "Microsoft.Azure.WebJobs.Logging.ApplicationInsights.ApplicationInsightsLoggerProvider","CategoryName": null,"LogLevel": "Trace","Filter": null}]}
2022-04-26T12:31:21.280 [Information] ConcurrencyOptions{"DynamicConcurrencyEnabled": false,"MaximumFunctionConcurrency": 500,"CPUThreshold": 0.8,"SnapshotPersistenceEnabled": true}
2022-04-26T12:31:21.280 [Information] FunctionResultAggregatorOptions{"BatchSize": 1000,"FlushTimeout": "00:00:30","IsEnabled": true}
2022-04-26T12:31:21.280 [Information] SingletonOptions{"LockPeriod": "00:00:15","ListenerLockPeriod": "00:01:00","LockAcquisitionTimeout": "10675199.02:48:05.4775807","LockAcquisitionPollingInterval": "00:00:05","ListenerLockRecoveryPollingInterval": "00:01:00"}
2022-04-26T12:31:21.280 [Information] ServiceBusOptions{"PrefetchCount": 0,"MessageHandlerOptions": {"AutoComplete": true,"MaxAutoRenewDuration": "00:05:00","MaxConcurrentCalls": 32},"SessionHandlerOptions": {"AutoComplete": true,"MaxAutoRenewDuration": "00:05:00","MaxConcurrentSessions": 2000,"MessageWaitTimeout": "00:01:00"},"BatchOptions": {"MaxMessageCount": 1000,"OperationTimeout": "00:01:00","AutoComplete": true}}
2022-04-26T12:31:21.281 [Information] Starting JobHost
2022-04-26T12:31:21.282 [Information] Starting Host (HostId=[too-long-functionname-trimmed-], InstanceId=8dd736e0-24c3-44b8-958c-073ca23ee788, Version=4.3.0.17971, ProcessId=15204, AppDomainId=1, InDebugMode=True, InDiagnosticMode=False, FunctionsExtensionVersion=~4)
2022-04-26T12:31:21.290 [Information] FUNCTIONS_WORKER_RUNTIME set to dotnet-isolated. Skipping WorkerConfig for language:java
2022-04-26T12:31:21.290 [Information] FUNCTIONS_WORKER_RUNTIME set to dotnet-isolated. Skipping WorkerConfig for language:node
2022-04-26T12:31:21.291 [Information] FUNCTIONS_WORKER_RUNTIME set to dotnet-isolated. Skipping WorkerConfig for language:powershell
2022-04-26T12:31:21.295 [Information] Loading functions metadata
2022-04-26T12:31:21.333 [Information] Reading functions metadata
2022-04-26T12:31:21.384 [Information] 1 functions found
2022-04-26T12:31:21.394 [Information] 1 functions loaded
2022-04-26T12:31:21.408 [Information] Generating 1 job function(s)
2022-04-26T12:31:21.444 [Information] Found the following functions:Host.Functions.HubListenerFunction
2022-04-26T12:31:21.475 [Information] Initializing function HTTP routesNo HTTP routes mapped
2022-04-26T12:31:21.479 [Information] Host initialized (192ms)
2022-04-26T12:31:21.486 [Information] HttpOptions{"DynamicThrottlesEnabled": false,"EnableChunkedRequestBinding": false,"MaxConcurrentRequests": -1,"MaxOutstandingRequests": -1,"RoutePrefix": "api"}
2022-04-26T12:31:21.579 [Information] Host started (296ms)
2022-04-26T12:31:21.579 [Information] Job host started
2022-04-26T12:31:22.277 [Information] Host Status: {"id": "[too-long-functionname-trimmed-]","state": "Running","version": "4.3.0.17971","versionDetails": "4.3.0+64614b7392f36e3871862d22d21e72304df901e9","platformVersion": "97.0.10.675","instanceId": "eb635ef47d9374f47ee1dde305a3c9fdb204691970dda38684f67a86ac3e26fb","computerName": "wn0sdwk0007EC","processUptime": 17534,"functionAppContentEditingState": "Unknown"}
2022-04-26T12:31:22.280 [Information] Host Status: {"id": "[too-long-functionname-trimmed-]","state": "Running","version": "4.3.0.17971","versionDetails": "4.3.0+64614b7392f36e3871862d22d21e72304df901e9","platformVersion": "97.0.10.675","instanceId": "eb635ef47d9374f47ee1dde305a3c9fdb204691970dda38684f67a86ac3e26fb","computerName": "wn0sdwk0007EC","processUptime": 17545,"functionAppContentEditingState": "Unknown"}
2022-04-26T12:31:22.353 [Information] Stopping JobHost
2022-04-26T12:31:22.354 [Information] Stopping the listener 'Microsoft.Azure.WebJobs.ServiceBus.Listeners.ServiceBusListener' for function 'HubListenerFunction'
2022-04-26T12:31:22.438 [Information] Worker process started and initialized.
2022-04-26T12:31:26.317 [Information] Host lock lease acquired by instance ID 'eb635ef47d9374f47ee1dde305a3c9fd'.

Novak-Peter avatar Apr 27 '22 11:04 Novak-Peter

Our v3 Functions instances are configured to use unique hubNames but share a single storageProvider.trackingStoreNamePrefix in a single storage account. This setup allows each Functions instance to fetch the orchestration statuses of all the other running instances from a single source of truth. Using this technique, it is possible to apply a side-by-side deployment scenario of our Durable Functions based asynchronous API wherein the superseded "legacy" function backends of our API gateway controlled service can complete their running orchestrations while the "current" functions backend version can execute own orchestrations while still being able to retrieve the processing status of the legacy backend.

I understand that this setup would be impossible when using separate storage accounts, correct?

kepikoi avatar May 05 '22 14:05 kepikoi