docker-airflow
docker-airflow copied to clipboard
"Failed to fetch log file from worker" when running LocalExecutor
I'm seeing this in the webinterface when trying to access the logs of a task, but only when running LocalExecutor - could this be a misconfiguration?
*** Log file isn't local.
*** Fetching here: http://33d456e47018:8793/log/g_pipelines/rawlogs_pipeline/2016-10-24T05:30:00
*** Failed to fetch log file from worker.
*** Reading remote logs...
*** Unsupported remote log location.
I'm getting the same issue. Any luck figuring this out?
Posted in Airflow's chat:
... I can control the port number of the worker with
worker_log_server_port
but the host seems to be inferred somehow:*** Fetching here: http://02cd4b7f1893:8793/log/functional-tests-v1/collect_only/2016-11-02T00:00:00
Looking thru the code I see the host comes from `TaskInstance.hostname: https://github.com/apache/incubator-airflow/blob/6f4704a447756d6a17c617afe1a9b54d629c79ac/airflow/www/views.py#L756
So, how could/should I manipulate the hostname such that I can view logs from within the airflow admin UI?
FYI, I can view the logs from my worker by manually going to the URL where it exposes them. So that's working. I just need airflow's admin to be aware of the proper location in the logs UI.
I had to add 8793 to the ports for the worker to expose the port, then I can navigate to the logs at http://[mydockerhost]:8793/log/[dag_id]/[task_id]/[execution_date]
worker:
image: puckel/docker-airflow:1.7.1.3-3
restart: always
environment:
- EXECUTOR=Celery
ports:
- "8793:8793"
command: worker
So here's the issue as I see it:
The worker creates a /etc/hosts
entry for its hostname:
$ docker exec -it worker sh -c "cat /etc/hosts | grep 7c69e75cba80"
172.17.0.30 7c69e75cba80
But the webserver does not:
$ docker exec -it webserver sh -c "cat /etc/hosts | grep 7c69e75cba80"
The assumption by airflow must be that these two processes are executing on the same host. In a containerized env, this is not the case. Therefore the webserver container needs a hosts mapping to the worker(s).
Since Docker 1.10, Docker don't use (update) anymore /etc/hosts file, he use an embeded DNS server. 8793 is openned by airflow worker, not by scheduler or webserver. And worker works with CeleryExecutor...so for the moment I've no idea how to deal with that issue with airflow in LocalExecutor mode.
At work, I'm currently moving Airflow stack from bare-metal server to Rkt containers and I'm suffering the same issue (with CeleryExecutor). If you look at TaskInstance definition (https://github.com/apache/incubator-airflow/blob/20c7ccc9cc1105643c85ddf5953615ca481c79e2/airflow/models.py#L708), airflow only use the hostname.
For the moment, the only solution I imagine is to use nerve/synapse (http://nerds.airbnb.com/smartstack-service-discovery-cloud/) to fill the /etc/hosts file of the webserver
I have the same issue. One option that makes troubleshooting a bit easier is to turn on email notifications. The reason for failure is included in the email....as it seems to be sent by the scheduler. It doesn't fix the issue, but is a nice workaround.
I've checked and I'm not facing the issue anymore with CeleryExecutor using docker 1.13 and docker-compose 1.11.1
Hi,
I've fixed the problem in branch v1.8.0 (commit d4636f9). Airflow try to fetch logs on port tcp/8793 which is created by airflow worker. If using LocalExecutor, the sheduler and the webserver must be on the same host.
I took a different approach to solve this, which was to declare /usr/local/airflow/logs
as a volume in my Dockerfile extending this image, and then to have my webserver container use the volumes from the scheduler. This allows having just one process per container. Note that the directory has to be created before declaring it as a volume otherwise it will be owned by root and the scheduler won't be able to write to it.
@puckel I'm also having this issue with CeleryExecutor.
- System: Windows 10 Home (Docker Toolbox)
- docker 17.05.0-ce
- docker-compose 1.13.0
- docker-machine 0.11.0
Same issue here Using 1.8.1, tried SequentialExecutor, LocalExecutor and CeleryExecutor.
Hi,
I've just started a fresh airflow compose stack (using Celery) and as you can see, I don't have the issue :
Could you please add details ?
@puckel can you please provide airflow.cnf? I suspect is a conf issue.
I'm using the docker-airflow config. In docker compose file, I set LOAD_EX to n
and I configure my volume to point to my dag.
@puckel Thanks for the reply, here is my airflow.cnf file: https://gist.github.com/crorella/9b42f5d974eaa18d6f1448b8228df399 please note sensitive info was replaced with "xxxxxxx" I can't spot where I'm making the mistake.
its fixed, thanks!
Ok cool. What was the problem ?
When I manually trigger the DAG from the UI it fails, if it runs via the scheduler I can see the logs.
same here, I can't see logs when launching from the UI, nor with airflow trigger_dag, however it seems to work when starting with airflow backfill.
That's really weird. This may be an issue with Airflow then, not this Docker image.
Yes, I tried the same in the standalone version (non docker) and it also fails.
@puckel It seems from your desktop that your image correctly resolves the worker IP from its celery identification.
*** Fetching here: https://THEWORKERID/log/...
While this is failing for me, and according to @crorella and @licryle this behavior is related to tasks started from the UI only. Can you confirm this?
If that's the case, I'm going to open this issue in the Apache Airflow issue tracker instead and we could close this one, since it's very likely not a problem with the docker setup but with Airflow itself.
Hi, Might it be related to this issue https://issues.apache.org/jira/browse/AIRFLOW-1355 ?
I don't think so, I can see the celery workers doing their stuff, and tasks are being marked as success/failure as well as their DAGs.
I am having issues with this as well
https://stackoverflow.com/questions/44780736/setting-up-s3-for-logs-in-airflow
Not sure if it is a bug or incompetence on my part haha. I can't get the logs to work normally or with S3
Having this issue also. just using docker-compose up (celery) with all the normal files in the repo. Did something change?
Nothing. I opened an issue on Airflow's tracker, but because I have been out of touch with airflow for a while I can't reproduce it again and give it more details.
I tried to search for related issues but found none. Could be incompetence, though. Searching issues on JIRA is kind of a hassle.
Guys I think the problem is the latest airflow version.
I am using 1.8.0 and I have the issue.
*** Log file isn't local. *** Fetching here: http://:8793/log/test_utils/sleeps_forever/2017-07-06T13:04:04.477048 *** Failed to fetch log file from worker.
*** Reading remote logs... *** Unsupported remote log location
I am using 1.8.0 and I have the issue.
udit@udit-capl ~/airflow $ airflow version [2017-07-06 13:06:01,848] {init.py:57} INFO - Using executor SequentialExecutor [2017-07-06 13:06:01,957] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt [2017-07-06 13:06:02,010] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
____ |( )_______ / /________ __ ____ /| |_ /__ / / __ / __ _ | /| / / ___ ___ | / _ / _ / _ / / // / |/ |/ / // |// // // // _/____/|__/ v1.8.0 udit@udit-capl ~/airflow $
I have the same issue with 1.8.1, but in my case it seems like a consequence of #94. If I use Airflow on my machine without Docker (macOS Sierra 10.12.5) by launching first the scheduler and then the webserver with SequentialExecutor (i.e. basic default configuration), it works (DAGs run, and I get the log). But with this docker compose, DAGs are not launched. It might also be a problem with Airflow itself, but if I launch it with basic configuration, it works.. so this docker compose should be checked as well. It used to work well until 1-2 weeks ago, not sure what has changed (I use a modified version, but both the modified version and this docker compose have the same problem).
I have: Docker: Version 17.06.0-ce-mac18 (18433) Channel: stable docker-compose: version 1.14.0, build c7bdf9e
I tried with previous versions of Docker: 1.12.6 (14937) and 17.03.1 (16048), the problem is still the same.
I am not trying with Docker. I install it on my actual system, not a Virtual Environment. with ubuntu 14.04
Make sure your volumes in the compose file are not commented out.
@flynnCoolblue Thanks for the suggestion, but this wasn't the case, the problem persists, DAGs are found and launched in the webserver, but scheduler and worker actually don't run them:
webserver_1 | [2017-07-06 09:42:20,577] [123] {models.py:168} INFO - Filling up the DagBag from /usr/local/airflow/dags
worker_1 | [2017-07-06 09:42:21 +0000] [43] [INFO] Handling signal: ttou
worker_1 | [2017-07-06 09:42:21 +0000] [116] [INFO] Worker exiting (pid: 116)
webserver_1 | [2017-07-06 09:42:21 +0000] [42] [INFO] Handling signal: ttou
webserver_1 | [2017-07-06 09:42:21 +0000] [115] [INFO] Worker exiting (pid: 115)
scheduler_1 | [2017-07-06 09:42:22 +0000] [43] [INFO] Handling signal: ttin
That seems to be a separate issue. From what we've seen here, DAGs are picked up but there's a difference in logging behavior when triggering by backfill or UI/scheduler.
@villasv yes, it's #94 but in my case most likely triggers #44, so it may be related from what I can understand
FYI I solved the issue for me: the problem was a recent Docker update. So I had to delete all Docker files, install a previous version and reboot. Now #94 is solved (and this issue as well) with: Version 17.03.0-ce-mac2 (15654)
Hmm. Well, I use 17.05.0-ce. But I'll try uninstalling and cleaning up.
I had same problem here (airflow v1.8.0). When I have used UI or command trigger_dag
to run my dag :
airflow trigger_dag <my_dag_here>
Nothing was happening, logs in UI shows :
*** Log file isn't local.
*** Fetching here: http://:8793/log/<my_dag_here>/<my_task_here>/2017-07-13T14:08:46
*** Failed to fetch log file from worker.
*** Reading remote logs...
*** Unsupported remote log location.
Airflow even doesn't created folders for log files.
But it works if I run command with backfill
as suggested by @licryle .
airflow backfill <my_dag_here> -s -1
+1 same as @EBazarov
Did someone tested the possible fix from this pull request #100 ???
@sdikby No, it's not fixing this issue.
The problem only occurs when using load_examples = True
, I've tested with tuto.py dag with LOAD_EX=n
in the compose file and the webserver have no problem to access logs on workers
@EBazarov Do you use a custom dag or examples ?
Another possible solution just found it incubator-airflow-dev mailing list, and i quote (thread name: Fixing logging for Airflow inside Docker):
I've got the same situation. I have dockerized airflow workers are running on different EC2s. To resolve this issue, I've set hostname of the docker container as the ip address of the EC2.
If you are using docker compose, you can add hostname field to the yaml file. Otherwise, use -h option to set hostname.
For me the problem was with jobs not starting, they were forever in Running state. Possible reasons were:
- The DAG was toggled off
- I had not started the scheduler
After fixing those, the jobs were executed and logs started working.
(I was using manual triggers and SequenceExecutor)
Just going to put this here since I've spent a couple hours troubleshooting this over the past few months and have given up multiple times. I'm running docker-airflow (w/ Celery executor) behind a corporate proxy and had HTTP_PROXY environment variables set. This was redirecting the log fetching calls to a proxy authentication web login page.
The fix was to simply add - NO_PROXY=*
to the environment:
section of the webserver
service in my docker compose yml.
For what it's worth, I am also seeing the message below when I triggered a DAG manually through the UI using the LocalExecutor.
*** Log file isn't local.
*** Fetching here: http://:8793/log/<my_dag_here>/<my_task_here>/2017-07-13T14:08:46
*** Failed to fetch log file from worker.
*** Reading remote logs...
*** Unsupported remote log location.
In addition the task seemed to be stuck in a running state.
When I changed from schedule_interval="@once"
and manually triggering the dag to schedule_interval="* * * * *"
and letting the scheduler pick it up the logs were visible as expected and the job ran as expected.
I had been hoping to use something like a one-time DAG to load connections to bootstrap a container for DAG development.
Just a quick update, I have closed the issue I had opened ([AIRFLOW-1372]) because I haven't been able to reproduce it for a long time so it can't be a bug in Airflow - or not anymore, at least.
@joshuasheppard
*** Fetching here: http://<missing?>:8793/...
Also see the early discussion about name resolution by @puckel.
I had the same problem - I am using CeleryExecutor, though not in docker. I got it resolved by correcting the Celery version - I was using Name: celery, Version: 4.0.2 So, I removed the celery and installed a previous version -
pip uninstall celery pip install 'celery>=3.1.17,<4.0'
I was also observing a 'harmless' looking message on my workers "airflow worker: Received and deleted unknown message. Wrong destination?!?", and I came across the celery version recommendation.
Reference - https://medium.com/a-r-g-o/installing-apache-airflow-on-ubuntu-aws-6ebac15db211
errors -
*** Log file isn't local. *** Fetching here: http://:8793/log/flywheel_data_validation_dag/flywheel_data_validate/2017-12-05T15:58:06.465615 *** Failed to fetch log file from worker.
Thanks for your comment, but we already fixed the celery version
ie. https://github.com/apache/incubator-airflow/blob/1.8.2/setup.py#L118
We are running airflow workers in docker containers on separate machines and had the exact same problem as in the initial post:
*** Log file isn't local.
*** Fetching here: http://33d456e47018:8793/log/g_pipelines/rawlogs_pipeline/2016-10-24T05:30:00
*** Failed to fetch log file from worker.
*** Reading remote logs...
*** Unsupported remote log location.
For us, setting the network_mode
to host
for the airflow worker containers causes that they can get the hostname of the machine they are running on. The hostname will be reported to celery and airflow will use this name to create the log URL, fixing our problem.
add two workarounds to fix the issue with docker-composer
- mapping worker container hostname to ip or real host name.
- add no_proxy to webserver (if your docker works behind some proxy).
@lwu08 there might be a third workaround, but I am struggeling to get it right 3. Map a volume to all containers for the log directory so that the webserver picks them up from its "local" dir instead of getting the logs via HTTP from the workers.
worker_1 | ValueError: Unable to configure handler 'file.processor': [Errno 13] Permission denied: '/usr/local/airflow/logs/scheduler'
the problem is: The volume now has root:root permissions but of course airflow in puckels container is running under airflow
*** Log file does not exist: /root/airflow/logs/test_python/print_ct/2018-03-08T18:33:33+00:00/1.log *** Fetching from: http://localhost:8793/log/test_python/print_ct/2018-03-08T18:33:33+00:00/1.log *** Failed to fetch log file from worker. HTTPConnectionPool(host='localhost', port=8793): Max retries exceeded with url: /log/test_python/print_ct/2018-03-08T18:33:33+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd399efa250>: Failed to establish a new connection: [Errno 111] Connection refused',))
Any idea how to fix it
My workaround, you could call it solution, is to mount a named-volume between scheduler, worker and webserver for the logfiles so that they appear "local" to the webserver. I am using the celery-executor.
[...]
#for scheduler, worker and webserver
volumes:
- ./dags/${DEPLOYMENT_CONFIGURATION}/:/usr/local/airflow/dags
- ./plugins:/usr/local/airflow/plugins
- ./helper:/usr/local/airflow/helper
- airflowlogs:/usr/local/airflow/logs
[...]
volumes:
airflowlogs: {}
is there a reason not to share all of /usr/local/airflow
between all the workers, scheduler and webserver?
I've got a common nfs share on each of my nodes, so am trying airflow-data:/usr/local/airflow/
I made the volume sharing worked by using volumes: - ./airflow-logs:/usr/local/airflow/logs
and we have to make sure ./airflow-logs is owned by a user airflow on the host chown airflow: ./airflow-logs
restart the service with docker-compose
@hmrc87 or @harris-chiu what kind of size log files do you get with this setup?
@omegagi it's the regular log files produced for each task in the dag. The good thing of that setup is all logs are shared by all docker container and also persisted on the host machine. When docker-airflow restarted, we still have the previous logs available
I had this issue with my installation as well but I realized the issue was because the logs directory after being deleted changed ownership to 'root'. I ran chown -R ec2-user:ec2-user logs/
to fix the ValueError: Unable to configure handler 'processor': [Errno 13] Permission denied: '/usr/local/airflow/logs/scheduler'
issue
Does anyone have a complete workaround to this problem ? I think I have the same problem, but I'm not able to make it work :/
After upgrading Airflow to 1.10.6 I am not able to see the log files in UI. I can see them in the worker location but getting 404 file not found when trying to access through UI. I tried using wget from webserver and got the same error.
Log file does not exist: /etc/airflow/logs/xxxx/1.log *** Fetching from: http://x.x.x.x.aws.xx.com:8793/log/xxxx/1.log *** Failed to fetch log file from worker. 404 Client Error: NOT FOUND for url: http://x.x.x.x.aws.xx.com:8793/log/xxxx/1.log
hiding sensitive information.
I also have this issue. Tried doing a complete deinstall of docker and upgrading to latest version. Not working.
*** Log file does not exist: /usr/local/airflow/logs/testing/Read_Calendar/2020-01-01T16:12:15.203501+00:00/1.log *** Fetching from: http://:8793/log/testing/Read_Calendar/2020-01-01T16:12:15.203501+00:00/1.log *** Failed to fetch log file from worker. Invalid URL 'http://:8793/log/testing/Read_Calendar/2020-01-01T16:12:15.203501+00:00/1.log': No host supplied
Not really sure what's changed here. I didn't make any config changes, the issue is currently only present in my local docker environment, and the config for that environment hasn't changed in weeks. My other dags seem to run fine as well, just this testing one that is having issues.
Config attached airflow-config-01-02-2020.txt
By setting the network to be that of the host, I was able to work around this. This forces docker to assign the same hostname to containers as the underlying host.
# On the worker host
docker run \
--net=host \
...
airflow worker
# On the master (webserver, scheduler host)
docker run \
--net=host \
...
airflow scheduler
docker run \
--net=host \
...
airflow webserver
I now see the container hostnames for task logs reflect the underlying hostnames:
*** Log file does not exist: /usr/local/airflow/logs/example-task/2020-01-05T08:57:24+00:00/1.log
*** Fetching from: http://ip-10-0-1-16.ec2.internal:8793/log/example-task/2020-01-05T08:57:24+00:00/1.log
I passed the
--hostname $HOSTNAME
docker parameter to docker run for the worker. so it will set the hostname for the docker instance.
@tkaymak @villasv
@manur @AdimUser What if there are multiple docker containers running in the same host machine at the port 8793 ?
This is what I've added to solve the issue:
airflow_scheduler:
hostname: scheduler_host
networks:
airflow_net:
aliases:
- scheduler_host
airflow_worker:
hostname: worker_host
networks:
airflow_net:
aliases:
- worker_host
networks:
airflow_net:
I made the volume sharing worked by using volumes:
- ./airflow-logs:/usr/local/airflow/logs
and we have to make sure ./airflow-logs is owned by a user airflow on the host chown airflow: ./airflow-logs
restart the service with docker-compose
This indeed solved the issue for me (airflow 1.10.10 set up with docker-compose). Many thanks.
The command chown airflow: ./airflow-logs however did not work for me. I received the error chown: invalid spec: ‘airflow:’.
I gave access to my airflow containers through modifying the permissions using chmod instead.
I was running on Celery Executer when I faced this error, after reading some answers here I figured my docker-containerized Airflow version 1 could go over the error by me, telling it to store everything you have like "/root/airflow/log.." to a place inside /home where I made it chmod 666. I also had a problem with my worker, it wasn't functioning, so as I was using RabbitMQ as my broker, I had to make it function, both of these changes made the error go away. Now I can read logs on webserver-webui even when code is running on a remote machine.(the very place that was showing me the inability to find or create log file)
Hope this help someone.