azure-functions-python-worker icon indicating copy to clipboard operation
azure-functions-python-worker copied to clipboard

[BUG] 502 Bad Gateway on Python Function

Open freeone3000 opened this issue 4 years ago • 14 comments

Opened new issue by request of @balag0

Investigative information

Please provide the following:
  • Timestamp: Wed, 27 May 2020 18:35:38 GMT
  • Function App name: covidqmdataimport
  • Function name(s) (as appropriate): DataImport
  • Core Tools version: azure-functions==1.2.0, func is 3.0.2245

Repro steps

Provide the steps required to reproduce the problem:

Issue POST request with correct key and body: {"target":"covid-staging"}

Expected behavior

Provide a description of the expected behavior.

As function has no return, would expect a 202 Accepted return immediately, as the function needs to process in the background. Function execution is expected to last approximately 30 minutes.

Actual behavior

Provide a description of the actual behavior observed.

After approximately 4 minutes (4 minutes 5 seconds this call), function instead returns a 502 Bad Gateway exception.

Known workarounds

Provide a description of any known workarounds.

Contents of the requirements.txt file:

Provide the requirements.txt file to help us find out module related issues.

pandas==1.0.3 azure-functions==1.2.0 azure-cosmos==3.1.1 requests==2.23.0 beautifulsoup4==4.9.0 xlrd==1.2.0

Related information

Provide any related information
  • Links to source
  • Bindings used

freeone3000 avatar May 27 '20 18:05 freeone3000

Thanks. I see the invocation is failing with the following exception

Result: Failure Exception: RuntimeError: function 'DataImport' without a $return binding returned a non-None value Stack: File '/azure-functions-host/workers/python/3.7/LINUX/X64/azure_functions_worker/dispatcher.py', line 319, in _handle__invocation_request f'function {fi.name!r} without a $return binding '

These issues provide more information on the execption https://github.com/Azure/azure-functions-python-worker/issues/74 https://github.com/Azure/azure-functions-python-worker/pull/75

This probably means the function.json and the actual function code do not match. you should also be able to repro this error running the app locally.

balag0 avatar May 27 '20 19:05 balag0

I have updated the python function to no longer return a variable, but the execution at Wed, 27 May 2020 19:35:52 GMT and the execution at ‎Wednesday, ‎May ‎27, ‎2020, ‏‎19:32:45 GMT had the same issue after the code update.

Is there any way for me to see the error that you're seeing? It's not showing up in Application Insights.

freeone3000 avatar May 27 '20 19:05 freeone3000

Can you check the "traces" table

https://docs.microsoft.com/en-us/azure/azure-monitor/log-query/get-started-portal#write-and-run-basic-queries

balag0 avatar May 27 '20 21:05 balag0

running traces as the kusto query with time range of 1 hr against application insights gives no results. Last 4 hours show successful deploys and invocations and some print statments, but nothing showing why the Postman query against the service is giving a 502 to me instead of a 202 or a 200. I can see the invocations triggered by the HTTP queries, and they look successful, so it's unclear why I would get an error back.

freeone3000 avatar May 27 '20 21:05 freeone3000

Hi @freeone3000,

In our most recent log, 2020-05-28 16:19:05,

The operation 'ScanLogs' with id '77fd14be-b8e5-47e0-af66-ad7bdb5231d6' did not complete in '00:02:00'.

In consumption plan, you cannot run a function with 30 minutes. It only allows the function app to run 10 minutes for each invocation. See if changing the setting in https://docs.microsoft.com/en-us/azure/azure-functions/functions-host-json#functiontimeout will help.

Hazhzeng avatar May 28 '20 18:05 Hazhzeng

Since 2020-05-21, before initial raise of this issue, my host.json file looked like https://gist.github.com/freeone3000/7bd5189263bbc5dbf9381e883f432c59 which includes a functionTimeout of "01:30:00". It had already been set, so I don't think this is the issue.

I've ran the query again today, and am still getting 502 Gateway Timeouts. traces still contains stdout from the service, but no useful logs.

freeone3000 avatar May 29 '20 19:05 freeone3000

Hi. Has there been any news on this issue? Do you need anything further from me?

freeone3000 avatar Jun 03 '20 15:06 freeone3000

Hi @freeone3000 how long are your function invocations expected to run ?

anirudhgarg avatar Jun 10 '20 15:06 anirudhgarg

We've managed to get execution time down to around 250 seconds, and we see some successful invocations with it as a scheduled function, but we're still seeing issues when calling the function manually through postman.

freeone3000 avatar Jun 10 '20 21:06 freeone3000

@freeone3000 - is it consistently failing when using postman or intermittently - if it is consistently failing then there might just be an issue in how you are constructing the request. Also after reducing the execution time is it now consistently succeeding for the scheduled function ?

anirudhgarg avatar Jun 12 '20 20:06 anirudhgarg

Consistently failing with postman, which I thought might be the issue so I also attempted to construct via requests, which also fails, unless it's within Azure Function Host, where it succeeds.

Reducing the execution time has caused 6/8 of the last runs to function, so it's improved, but not consistent - if one of our backend databases takes another minute or two longer than usual, we then fail.

If I try directly via curl: curl --location --request POST 'http://SERVICE_NAME/api/DataImport'
--header 'Content-Type: application/json'
--header 'x-functions-key: XXXXXXXXX'
--data-raw '{"target":"staging"}'

I still get either a 404 or a 502. The above call should work reliably.

freeone3000 avatar Jun 12 '20 20:06 freeone3000

The header for the functions key looks correct. It is surprising that you get a 404 sometimes and 502 sometimes. What do you get if you try the "Test" in the portal ?

anirudhgarg avatar Jun 13 '20 02:06 anirudhgarg

I see the following exceptions on the function app. @brettsam I found this issue from earlier with the exception. Looks like the host restarts when this exception is thrown and that causes further issues since the host doesn't start back up properly. Any ideas on what we can do avoid this exception?

@freeone3000 If possible could you split the functions into separate apps to see if we can isolate the failures.

         
  t Summary Details Level
  2020-06-12 20:54:22.2660000 An unhandled exception has occurred. Host is shutting down. System.TimeoutException : The operation 'ScanContainer' with id '392f6cd0-0faf-43f9-9ad2-3a9d6fc34087' did not complete in '00:02:00'.    at async Microsoft.Azure.WebJobs.Extensions.Storage.TimeoutHandler.ExecuteWithTimeoutT at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Extensions.Storage\TimeoutHandler.cs : 30 2
  2020-06-12 20:54:47.0800000 A host error has occurred during startup operation '969372f7-4d20-4047-85fa-326fba4119bf'. Microsoft.Azure.WebJobs.Script.HostInitializationException : Did not find functions with language [python].    at Microsoft.Azure.WebJobs.Script.Utility.VerifyFunctionsMatchSpecifiedLanguage(IEnumerable1 functions,String workerRuntime,Boolean isPlaceholderMode,Boolean isHttpWorker) at /src/azure-functions-host/src/WebJobs.Script/Utility.cs : 562    at async Microsoft.Azure.WebJobs.Script.ScriptHost.GetFunctionDescriptorsAsync(IEnumerable1 functions,IEnumerable1 descriptorProviders) at /src/azure-functions-host/src/WebJobs.Script/Host/ScriptHost.cs : 673    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at async Microsoft.Azure.WebJobs.Script.ScriptHost.InitializeFunctionDescriptorsAsync(IEnumerable1 functionMetadata) at /src/azure-functions-host/src/WebJobs.Script/Host/ScriptHost.cs : 493    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at async Microsoft.Azure.WebJobs.Script.ScriptHost.InitializeAsync(CancellationToken cancellationToken) at /src/azure-functions-host/src/WebJobs.Script/Host/ScriptHost.cs : 295    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at async Microsoft.Azure.WebJobs.Script.ScriptHost.StartAsyncCore(CancellationToken cancellationToken) at /src/azure-functions-host/src/WebJobs.Script/Host/ScriptHost.cs : 248    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at async Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at async Microsoft.Azure.WebJobs.Script.WebHost.WebJobsScriptHostService.UnsynchronizedStartHostAsync(ScriptHostStartupOperation activeOperation,Int32 attemptCount,JobHostStartupMode startupMode) at /src/azure-functions-host/src/WebJobs.Script.WebHost/WebJobsScriptHostService.cs : 264 2
  2020-06-12 21:02:28.4200000 An unhandled exception has occurred. Host is shutting down. System.TimeoutException : The operation 'ScanContainer' with id '21282512-7cfc-4db1-a7a5-8a690e87955d' did not complete in '00:02:00'.    at async Microsoft.Azure.WebJobs.Extensions.Storage.TimeoutHandler.ExecuteWithTimeoutT at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Extensions.Storage\TimeoutHandler.cs : 30 2
  2020-06-12 21:02:49.5870000 A host error has occurred during startup operation 'ccb7e985-82cc-41bf-9efb-0034394e52d4'. Microsoft.Azure.WebJobs.Script.HostInitializationException : Did not find functions with language [python].    at Microsoft.Azure.WebJobs.Script.Utility.VerifyFunctionsMatchSpecifiedLanguage(IEnumerable1 functions,String workerRuntime,Boolean isPlaceholderMode,Boolean isHttpWorker) at /src/azure-functions-host/src/WebJobs.Script/Utility.cs : 562    at async Microsoft.Azure.WebJobs.Script.ScriptHost.GetFunctionDescriptorsAsync(IEnumerable1 functions,IEnumerable1 descriptorProviders) at /src/azure-functions-host/src/WebJobs.Script/Host/ScriptHost.cs : 673    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at async Microsoft.Azure.WebJobs.Script.ScriptHost.InitializeFunctionDescriptorsAsync(IEnumerable1 functionMetadata) at /src/azure-functions-host/src/WebJobs.Script/Host/ScriptHost.cs : 493    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at async Microsoft.Azure.WebJobs.Script.ScriptHost.InitializeAsync(CancellationToken cancellationToken) at /src/azure-functions-host/src/WebJobs.Script/Host/ScriptHost.cs : 295    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at async Microsoft.Azure.WebJobs.Script.ScriptHost.StartAsyncCore(CancellationToken cancellationToken) at /src/azure-functions-host/src/WebJobs.Script/Host/ScriptHost.cs : 248    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at async Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at async Microsoft.Azure.WebJobs.Script.WebHost.WebJobsScriptHostService.UnsynchronizedStartHostAsync(ScriptHostStartupOperation activeOperation,Int32 attemptCount,JobHostStartupMode startupMode) at /src/azure-functions-host/src/WebJobs.Script.WebHost/WebJobsScriptHostService.cs : 264 2
  2020-06-12 21:06:05.0800000 An unhandled exception has occurred. Host is shutting down. System.TimeoutException : The operation 'ScanLogs' with id '13f6975a-bd81-4448-b1cf-6e9c4c53b789' did not complete in '00:02:00'.    at async Microsoft.Azure.WebJobs.Extensions.Storage.TimeoutHandler.ExecuteWithTimeoutT at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Extensions.Storage\TimeoutHandler.cs : 30 2
  2020-06-12 21:06:30.1560000 A host error has occurred during startup operation 'fa16040c-20d9-485b-a3e8-372718490731'. Microsoft.Azure.WebJobs.Script.HostInitializationException : Did not find functions with language [python].    at Microsoft.Azure.WebJobs.Script.Utility.VerifyFunctionsMatchSpecifiedLanguage(IEnumerable1 functions,String workerRuntime,Boolean isPlaceholderMode,Boolean isHttpWorker) at /src/azure-functions-host/src/WebJobs.Script/Utility.cs : 562    at async Microsoft.Azure.WebJobs.Script.ScriptHost.GetFunctionDescriptorsAsync(IEnumerable1 functions,IEnumerable1 descriptorProviders) at /src/azure-functions-host/src/WebJobs.Script/Host/ScriptHost.cs : 673    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at async Microsoft.Azure.WebJobs.Script.ScriptHost.InitializeFunctionDescriptorsAsync(IEnumerable1 functionMetadata) at /src/azure-functions-host/src/WebJobs.Script/Host/ScriptHost.cs : 493    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at async Microsoft.Azure.WebJobs.Script.ScriptHost.InitializeAsync(CancellationToken cancellationToken) at /src/azure-functions-host/src/WebJobs.Script/Host/ScriptHost.cs : 295    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at async Microsoft.Azure.WebJobs.Script.ScriptHost.StartAsyncCore(CancellationToken cancellationToken) at /src/azure-functions-host/src/WebJobs.Script/Host/ScriptHost.cs : 248    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at async Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)    at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at async Microsoft.Azure.WebJobs.Script.WebHost.WebJobsScriptHostService.UnsynchronizedStartHostAsync(ScriptHostStartupOperation activeOperation,Int32 attemptCount,JobHostStartupMode startupMode) at /src/azure-functions-host/src/WebJobs.Script.WebHost/WebJobsScriptHostService.cs : 264 2

balag0 avatar Jun 13 '20 07:06 balag0

When each function is spun off into their own app function, and care is taken to never throw an exception from beyond the python boundary, execution appears to proceed, even with no code changes beyond adding an over-broad try/except to every invoked main(). It appears that this individual App Function has entered into some broken state.

freeone3000 avatar Jun 15 '20 17:06 freeone3000