Unable to use AzureLogHandler into a GUnicorn/Flask app
Describe your environment.
We want to export Python logs to Azure Monitor.
We have a simple python gunicorn/flask webapp, we are unable to execute logger.info() (or other log methods) and successfully send the message to Application Insights.
It seems that when AzureLogHandler runs into flask/gunicorn only the main process use the handler, child thread/processes are unable to send log lines to Azure Application Insights.
For example, when http request is received by a flask/gunicorn app, should be unable to log through the handler, but nothing happens. It seems that having HTTP requests handled into child processes (threads or forks) in some way is related to this behaviour.
Current environment is running into a docker container with:
- ubuntu:18.04
- Python 3.6.9
- Flask==1.1.2
- gunicorn==20.0.4
- opencensus-ext-azure==1.0.4
- opencensus-ext-flask==0.7.3
Steps to reproduce. I have created a minimal project on github:
git clone https://github.com/freedev/flask-azure-application-insights
cd flask-azure-application-insights
./run-docker.sh <InstrumentationKey>
Change <InstrumentationKey> with your own Instrumentation Key (00000000-0000-0000-0000-000000000000) run-docker.sh script will start a simple web app running into a docker instance and listening on port 5000.
Once the docker instance is started there is a first log message written successfully by main process.
After we expect that each http request that comes to flask application should be logged through the AzureLogHandler but none arrives to Application Insights.
try with:
curl http://localhost:5000/logging
What is the expected behavior? We expect to be able to use AzureLogHandler and write into Application Insights for each http request received;
Each http request that comes to flask application should be logged through the AzureLogHandler.
For example: once the application is started with:
./run-docker.sh <InstrumentationKey>
Then open Azure Portal -> Application Insights -> Monitoring - Logs
Then execute the query:
traces
| sort by timestamp desc
You should be able to see the line with message:
main process started (pid)
But even a message for each request handled:
handling request /logging (pid)
Please note that messages for request handling are missing.
What is the actual behavior? Only the log lines written by main process appears into Application Insights. Child process/threads are unable to send log lines to Application insights. To doublecheck try to start flask without gunicorn:
export APPLICATIONINSIGHTS_CONNECTION_STRING="InstrumentationKey=00000000-0000-0000-0000-000000000000"
python application.py
In this way I see each http request being logged successfully. In our context not have gunicorn is not acceptable this solution. We have tried to start the flask without gunicorn just to have the proof that AzureLogHandler does not work in a multithreaded/multiprocess environment.
Additional context. Add any other context about the problem here.
See this comment for something that is related. It seems that gunicorn uses os.fork to spawn it's child processes, so the worker thread (in your parent process) that is responsible for exporting your telemetry is not copied.
You can either try having the initialization logic (AzureLogHandler) be initialized newly in all of your child processes (put the logic in where the fork will happen), or try to use the multiprocessing module like here.
Hi @lzchen, thanks for your suggestion.
I suppose that the change of context would be a problem and sharing the Azure Log Handler between different processes would be a problem.
So I tried to configure GUnicorn using multiple thread instead of process, but without any luck. Please see the GU_THREADS_NUM and GU_WORK_NUM env variables defined into the entrypoint.sh script.
~~But I even tried to create a new logger (and a new AzureLogHandler) for each process/thread but even this solutions seems to fail (here I've added a branch named instance_per_pid where I try to craete a new logger for each pid).~~
I believe GU_WORK_NUM represents the number of workers that are spun up (each which are a new process). In run-docker.sh, what values are you using for GU_THREADS_NUM and GU_WORK_NUM?
Also, can you specify exactly what you do when you are testing creating a new logger instance per process/thread and what you are seeing in your App insights telemetry to indicate that it is not working?
Thanks @lzchen, trying to create a new logger for each pid (process or thread) has worked. Now I have an instance AzureLogHandler working for each process.
Instead of having a global logger now I've created a getLogger() function that returns for each process/thread its own logger, the working implementation is here master branch
I believe
GU_WORK_NUMrepresents the number of workers that are spun up (each which are a new process).
yep
In run-docker.sh, what values are you using for
GU_THREADS_NUMandGU_WORK_NUM?
I've tried with GU_THREADS_NUM=3 and GU_WORK_NUM=1 and GU_WORK_NUM=3 and GU_THREADS_NUM=1 As said, having a new logger instance (and a new AzureLogHandler) for each pid has figured out the problem.
getLogger() works well with more workers (GU_WORK_NUM > 1), I see a new line in the log analytics for each process id.
On the other hand, when I try with multiple threads, the load I'm generating with my browser it's not enough to generate many threads. I think I've to write a test unit to double check this.
BTW, reading at row generated in the log analytics I see the AzureLogHandler try to track where is running, I mean in which process. Have a look at this screenshot. The customDimensions field is generated automatically by the log handler.
BTW, reading at row generated in the log analytics I see the AzureLogHandler try to track where is running, I mean in which process. Have a look at this screenshot. The customDimensions field is generated automatically by the log handler.
Oh that is useful information. Appreciate the help.
I'm glad you got it working! Is this issue resolved?
Hi @lzchen, not sure if I can consider the issue resolved. The mileage may vary :) Just to understand if we can accept this solution, I'm trying to imagine what will happen at scale. Consider there are many microservices and using this approach, each microservice has a number logger depending from how the concurrency is configured. How many AzureLogHandler can I have concurrently opened? Each of them open a connection (tcp) to the Log Analyzer server? How many connection the Log Analyzer server handles? Should I open an new issue trying to understand this? Well, sorry, I don't want overwhelm you with all this questions, it is just to let you know my doubts about.
@freedev Yes those concerns make perfect sense. The way the Azure exporters are designed today, I do not believe that there is a different workaround that would not generate equivalent concerns. Might I suggest opening a feature request to explore possible multi-processing solutions in OpenCensus?
Just an FYI: Take a look at this link. It seemed like using a multiprocessing queue in the exporter worked. Could you possibly test your code using this?
Yes those concerns make perfect sense. The way the Azure exporters are designed today, I do not believe that there is a different workaround that would not generate equivalent concerns. Might I suggest opening a feature request to explore possible multi-processing solutions in OpenCensus?
Yep, I will.
Just an FYI: Take a look at this link. It seemed like using a multiprocessing queue in the exporter worked. Could you possibly test your code using this?
Again, yes. I remember this link comes from the comment you gave me in first instance and now, with hindsight, it seems very helpful.
That's great. Are you able to mark the ticket submitted to Azure devops as resolved? If this is a new feature, we can continue the conversation here or in the new feature request.
Hi both, stumbled on this as i am facing the same issue. However i noticed removing the --preload flag on my gunicorn command seems to allow logging to App Insights. @freedev have you seen this also?
More info on --preload:
The --preload flag tells Gunicorn to "load the app before forking the worker processes". By doing so, each worker is "given a copy of the app, already instantiated by the Master, rather than instantiating the app itself". This means the following code only executes once in the Master process:
@freedev I was facing similar issue. Got it working by using multiprocessing-logging with minimal changes. Usage as mentioned in here Just make sure you call install_mp_handler() after you have added all the desired handlers.