docker-airflow icon indicating copy to clipboard operation
docker-airflow copied to clipboard

"Failed to fetch log file from worker" when running LocalExecutor

Open tkaymak opened this issue 8 years ago • 70 comments

I'm seeing this in the webinterface when trying to access the logs of a task, but only when running LocalExecutor - could this be a misconfiguration?

*** Log file isn't local.
*** Fetching here: http://33d456e47018:8793/log/g_pipelines/rawlogs_pipeline/2016-10-24T05:30:00
*** Failed to fetch log file from worker.

*** Reading remote logs...
*** Unsupported remote log location.

tkaymak avatar Oct 25 '16 14:10 tkaymak

I'm getting the same issue. Any luck figuring this out?

westhomas avatar Nov 03 '16 15:11 westhomas

Posted in Airflow's chat:

... I can control the port number of the worker with worker_log_server_port but the host seems to be inferred somehow: *** Fetching here: http://02cd4b7f1893:8793/log/functional-tests-v1/collect_only/2016-11-02T00:00:00

Looking thru the code I see the host comes from `TaskInstance.hostname: https://github.com/apache/incubator-airflow/blob/6f4704a447756d6a17c617afe1a9b54d629c79ac/airflow/www/views.py#L756

So, how could/should I manipulate the hostname such that I can view logs from within the airflow admin UI?

FYI, I can view the logs from my worker by manually going to the URL where it exposes them. So that's working. I just need airflow's admin to be aware of the proper location in the logs UI.

I had to add 8793 to the ports for the worker to expose the port, then I can navigate to the logs at http://[mydockerhost]:8793/log/[dag_id]/[task_id]/[execution_date]

worker:
    image: puckel/docker-airflow:1.7.1.3-3
    restart: always
    environment:
        - EXECUTOR=Celery
    ports:
        - "8793:8793"
    command: worker

westhomas avatar Nov 03 '16 19:11 westhomas

So here's the issue as I see it:

The worker creates a /etc/hosts entry for its hostname:

$ docker exec -it worker sh -c "cat /etc/hosts | grep 7c69e75cba80"
172.17.0.30 7c69e75cba80

But the webserver does not:

$ docker exec -it webserver sh -c "cat /etc/hosts | grep 7c69e75cba80"

The assumption by airflow must be that these two processes are executing on the same host. In a containerized env, this is not the case. Therefore the webserver container needs a hosts mapping to the worker(s).

westhomas avatar Nov 04 '16 17:11 westhomas

Since Docker 1.10, Docker don't use (update) anymore /etc/hosts file, he use an embeded DNS server. 8793 is openned by airflow worker, not by scheduler or webserver. And worker works with CeleryExecutor...so for the moment I've no idea how to deal with that issue with airflow in LocalExecutor mode.

puckel avatar Nov 07 '16 15:11 puckel

At work, I'm currently moving Airflow stack from bare-metal server to Rkt containers and I'm suffering the same issue (with CeleryExecutor). If you look at TaskInstance definition (https://github.com/apache/incubator-airflow/blob/20c7ccc9cc1105643c85ddf5953615ca481c79e2/airflow/models.py#L708), airflow only use the hostname.

For the moment, the only solution I imagine is to use nerve/synapse (http://nerds.airbnb.com/smartstack-service-discovery-cloud/) to fill the /etc/hosts file of the webserver

puckel avatar Feb 13 '17 13:02 puckel

I have the same issue. One option that makes troubleshooting a bit easier is to turn on email notifications. The reason for failure is included in the email....as it seems to be sent by the scheduler. It doesn't fix the issue, but is a nice workaround.

jkuruzovich avatar Feb 20 '17 17:02 jkuruzovich

I've checked and I'm not facing the issue anymore with CeleryExecutor using docker 1.13 and docker-compose 1.11.1

puckel avatar Feb 20 '17 19:02 puckel

Hi,

I've fixed the problem in branch v1.8.0 (commit d4636f9). Airflow try to fetch logs on port tcp/8793 which is created by airflow worker. If using LocalExecutor, the sheduler and the webserver must be on the same host.

puckel avatar Feb 22 '17 12:02 puckel

I took a different approach to solve this, which was to declare /usr/local/airflow/logs as a volume in my Dockerfile extending this image, and then to have my webserver container use the volumes from the scheduler. This allows having just one process per container. Note that the directory has to be created before declaring it as a volume otherwise it will be owned by root and the scheduler won't be able to write to it.

fmcgough avatar Mar 03 '17 13:03 fmcgough

@puckel I'm also having this issue with CeleryExecutor.

  • System: Windows 10 Home (Docker Toolbox)
  • docker 17.05.0-ce
  • docker-compose 1.13.0
  • docker-machine 0.11.0

villasv avatar May 26 '17 17:05 villasv

Same issue here Using 1.8.1, tried SequentialExecutor, LocalExecutor and CeleryExecutor.

crorella avatar May 29 '17 00:05 crorella

Hi,

I've just started a fresh airflow compose stack (using Celery) and as you can see, I don't have the issue : 2017-05-29-100020_1187x644_scrot

Could you please add details ?

puckel avatar May 29 '17 08:05 puckel

@puckel can you please provide airflow.cnf? I suspect is a conf issue.

crorella avatar May 30 '17 23:05 crorella

I'm using the docker-airflow config. In docker compose file, I set LOAD_EX to n and I configure my volume to point to my dag.

puckel avatar May 31 '17 07:05 puckel

@puckel Thanks for the reply, here is my airflow.cnf file: https://gist.github.com/crorella/9b42f5d974eaa18d6f1448b8228df399 please note sensitive info was replaced with "xxxxxxx" I can't spot where I'm making the mistake.

crorella avatar May 31 '17 16:05 crorella

its fixed, thanks!

crorella avatar Jun 01 '17 16:06 crorella

Ok cool. What was the problem ?

puckel avatar Jun 01 '17 19:06 puckel

When I manually trigger the DAG from the UI it fails, if it runs via the scheduler I can see the logs.

crorella avatar Jun 01 '17 22:06 crorella

same here, I can't see logs when launching from the UI, nor with airflow trigger_dag, however it seems to work when starting with airflow backfill.

licryle avatar Jun 12 '17 05:06 licryle

That's really weird. This may be an issue with Airflow then, not this Docker image.

villasv avatar Jun 12 '17 22:06 villasv

Yes, I tried the same in the standalone version (non docker) and it also fails.

crorella avatar Jun 14 '17 02:06 crorella

@puckel It seems from your desktop that your image correctly resolves the worker IP from its celery identification.

*** Fetching here: https://THEWORKERID/log/...

While this is failing for me, and according to @crorella and @licryle this behavior is related to tasks started from the UI only. Can you confirm this?

If that's the case, I'm going to open this issue in the Apache Airflow issue tracker instead and we could close this one, since it's very likely not a problem with the docker setup but with Airflow itself.

villasv avatar Jun 29 '17 13:06 villasv

Hi, Might it be related to this issue https://issues.apache.org/jira/browse/AIRFLOW-1355 ?

puckel avatar Jun 30 '17 12:06 puckel

I don't think so, I can see the celery workers doing their stuff, and tasks are being marked as success/failure as well as their DAGs.

villasv avatar Jun 30 '17 17:06 villasv

I am having issues with this as well

https://stackoverflow.com/questions/44780736/setting-up-s3-for-logs-in-airflow

Not sure if it is a bug or incompetence on my part haha. I can't get the logs to work normally or with S3

JackStat avatar Jul 03 '17 16:07 JackStat

Having this issue also. just using docker-compose up (celery) with all the normal files in the repo. Did something change?

flynnCoolblue avatar Jul 04 '17 20:07 flynnCoolblue

Nothing. I opened an issue on Airflow's tracker, but because I have been out of touch with airflow for a while I can't reproduce it again and give it more details.

I tried to search for related issues but found none. Could be incompetence, though. Searching issues on JIRA is kind of a hassle.

villasv avatar Jul 04 '17 22:07 villasv

Guys I think the problem is the latest airflow version.

marcusesa avatar Jul 06 '17 01:07 marcusesa

I am using 1.8.0 and I have the issue.

JackStat avatar Jul 06 '17 01:07 JackStat

*** Log file isn't local. *** Fetching here: http://:8793/log/test_utils/sleeps_forever/2017-07-06T13:04:04.477048 *** Failed to fetch log file from worker.

*** Reading remote logs... *** Unsupported remote log location

I am using 1.8.0 and I have the issue.

udit@udit-capl ~/airflow $ airflow version [2017-07-06 13:06:01,848] {init.py:57} INFO - Using executor SequentialExecutor [2017-07-06 13:06:01,957] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt [2017-07-06 13:06:02,010] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt


____ |( )_______ / /________ __ ____ /| |_ /__ / / __ / __ _ | /| / / ___ ___ | / _ / _ / _ / / // / |/ |/ / // |// // // // _/____/|__/ v1.8.0 udit@udit-capl ~/airflow $

Uditmittal avatar Jul 06 '17 07:07 Uditmittal

I have the same issue with 1.8.1, but in my case it seems like a consequence of #94. If I use Airflow on my machine without Docker (macOS Sierra 10.12.5) by launching first the scheduler and then the webserver with SequentialExecutor (i.e. basic default configuration), it works (DAGs run, and I get the log). But with this docker compose, DAGs are not launched. It might also be a problem with Airflow itself, but if I launch it with basic configuration, it works.. so this docker compose should be checked as well. It used to work well until 1-2 weeks ago, not sure what has changed (I use a modified version, but both the modified version and this docker compose have the same problem).

I have: Docker: Version 17.06.0-ce-mac18 (18433) Channel: stable docker-compose: version 1.14.0, build c7bdf9e

openp2pdesign avatar Jul 06 '17 07:07 openp2pdesign

I tried with previous versions of Docker: 1.12.6 (14937) and 17.03.1 (16048), the problem is still the same.

openp2pdesign avatar Jul 06 '17 08:07 openp2pdesign

I am not trying with Docker. I install it on my actual system, not a Virtual Environment. with ubuntu 14.04

Uditmittal avatar Jul 06 '17 08:07 Uditmittal

Make sure your volumes in the compose file are not commented out.

flynnCoolblue avatar Jul 06 '17 08:07 flynnCoolblue

@flynnCoolblue Thanks for the suggestion, but this wasn't the case, the problem persists, DAGs are found and launched in the webserver, but scheduler and worker actually don't run them:

webserver_1  | [2017-07-06 09:42:20,577] [123] {models.py:168} INFO - Filling up the DagBag from /usr/local/airflow/dags
worker_1     | [2017-07-06 09:42:21 +0000] [43] [INFO] Handling signal: ttou
worker_1     | [2017-07-06 09:42:21 +0000] [116] [INFO] Worker exiting (pid: 116)
webserver_1  | [2017-07-06 09:42:21 +0000] [42] [INFO] Handling signal: ttou
webserver_1  | [2017-07-06 09:42:21 +0000] [115] [INFO] Worker exiting (pid: 115)
scheduler_1  | [2017-07-06 09:42:22 +0000] [43] [INFO] Handling signal: ttin

openp2pdesign avatar Jul 06 '17 09:07 openp2pdesign

That seems to be a separate issue. From what we've seen here, DAGs are picked up but there's a difference in logging behavior when triggering by backfill or UI/scheduler.

villasv avatar Jul 06 '17 10:07 villasv

@villasv yes, it's #94 but in my case most likely triggers #44, so it may be related from what I can understand

openp2pdesign avatar Jul 06 '17 10:07 openp2pdesign

FYI I solved the issue for me: the problem was a recent Docker update. So I had to delete all Docker files, install a previous version and reboot. Now #94 is solved (and this issue as well) with: Version 17.03.0-ce-mac2 (15654)

openp2pdesign avatar Jul 10 '17 06:07 openp2pdesign

Hmm. Well, I use 17.05.0-ce. But I'll try uninstalling and cleaning up.

villasv avatar Jul 10 '17 14:07 villasv

I had same problem here (airflow v1.8.0). When I have used UI or command trigger_dag to run my dag :

airflow trigger_dag <my_dag_here>

Nothing was happening, logs in UI shows :

*** Log file isn't local.
*** Fetching here: http://:8793/log/<my_dag_here>/<my_task_here>/2017-07-13T14:08:46
*** Failed to fetch log file from worker.

*** Reading remote logs...
*** Unsupported remote log location.

Airflow even doesn't created folders for log files.

But it works if I run command with backfill as suggested by @licryle .

airflow backfill <my_dag_here> -s -1

EBazarov avatar Jul 13 '17 14:07 EBazarov

+1 same as @EBazarov

dmateusp avatar Jul 14 '17 13:07 dmateusp

Did someone tested the possible fix from this pull request #100 ???

sdikby avatar Jul 15 '17 15:07 sdikby

@sdikby No, it's not fixing this issue.

The problem only occurs when using load_examples = True , I've tested with tuto.py dag with LOAD_EX=n in the compose file and the webserver have no problem to access logs on workers

@EBazarov Do you use a custom dag or examples ?

puckel avatar Jul 27 '17 08:07 puckel

Another possible solution just found it incubator-airflow-dev mailing list, and i quote (thread name: Fixing logging for Airflow inside Docker):

I've got the same situation. I have dockerized airflow workers are running on different EC2s. To resolve this issue, I've set hostname of the docker container as the ip address of the EC2.

If you are using docker compose, you can add hostname field to the yaml file. Otherwise, use -h option to set hostname.

sdikby avatar Aug 23 '17 14:08 sdikby

For me the problem was with jobs not starting, they were forever in Running state. Possible reasons were:

  • The DAG was toggled off
  • I had not started the scheduler

After fixing those, the jobs were executed and logs started working.

(I was using manual triggers and SequenceExecutor)

antru avatar Oct 03 '17 10:10 antru

Just going to put this here since I've spent a couple hours troubleshooting this over the past few months and have given up multiple times. I'm running docker-airflow (w/ Celery executor) behind a corporate proxy and had HTTP_PROXY environment variables set. This was redirecting the log fetching calls to a proxy authentication web login page.

The fix was to simply add - NO_PROXY=* to the environment: section of the webserver service in my docker compose yml.

ajhodges avatar Oct 06 '17 16:10 ajhodges

For what it's worth, I am also seeing the message below when I triggered a DAG manually through the UI using the LocalExecutor.

 *** Log file isn't local.
 *** Fetching here: http://:8793/log/<my_dag_here>/<my_task_here>/2017-07-13T14:08:46
 *** Failed to fetch log file from worker.
 
 *** Reading remote logs...
 *** Unsupported remote log location.

In addition the task seemed to be stuck in a running state.

When I changed from schedule_interval="@once" and manually triggering the dag to schedule_interval="* * * * *" and letting the scheduler pick it up the logs were visible as expected and the job ran as expected.

I had been hoping to use something like a one-time DAG to load connections to bootstrap a container for DAG development.

joshuasheppard avatar Nov 29 '17 02:11 joshuasheppard

Just a quick update, I have closed the issue I had opened ([AIRFLOW-1372]) because I haven't been able to reproduce it for a long time so it can't be a bug in Airflow - or not anymore, at least.

@joshuasheppard *** Fetching here: http://<missing?>:8793/...

Also see the early discussion about name resolution by @puckel.

villasv avatar Nov 30 '17 11:11 villasv

I had the same problem - I am using CeleryExecutor, though not in docker. I got it resolved by correcting the Celery version - I was using Name: celery, Version: 4.0.2 So, I removed the celery and installed a previous version -

pip uninstall celery pip install 'celery>=3.1.17,<4.0'

I was also observing a 'harmless' looking message on my workers "airflow worker: Received and deleted unknown message. Wrong destination?!?", and I came across the celery version recommendation.

Reference - https://medium.com/a-r-g-o/installing-apache-airflow-on-ubuntu-aws-6ebac15db211

errors -

*** Log file isn't local. *** Fetching here: http://:8793/log/flywheel_data_validation_dag/flywheel_data_validate/2017-12-05T15:58:06.465615 *** Failed to fetch log file from worker.

AdityaLad avatar Dec 05 '17 11:12 AdityaLad

Thanks for your comment, but we already fixed the celery version

ie. https://github.com/apache/incubator-airflow/blob/1.8.2/setup.py#L118

puckel avatar Dec 05 '17 16:12 puckel

May related to this issue AIRFLOW-1735

Log files do not show up for unscheduled dags.

kehao95 avatar Dec 18 '17 06:12 kehao95

We are running airflow workers in docker containers on separate machines and had the exact same problem as in the initial post:

*** Log file isn't local.
*** Fetching here: http://33d456e47018:8793/log/g_pipelines/rawlogs_pipeline/2016-10-24T05:30:00
*** Failed to fetch log file from worker.

*** Reading remote logs...
*** Unsupported remote log location.

For us, setting the network_mode to host for the airflow worker containers causes that they can get the hostname of the machine they are running on. The hostname will be reported to celery and airflow will use this name to create the log URL, fixing our problem.

DemonTPx avatar Dec 29 '17 07:12 DemonTPx

add two workarounds to fix the issue with docker-composer

  1. mapping worker container hostname to ip or real host name.
  2. add no_proxy to webserver (if your docker works behind some proxy).

lwu08 avatar Feb 02 '18 21:02 lwu08

@lwu08 there might be a third workaround, but I am struggeling to get it right 3. Map a volume to all containers for the log directory so that the webserver picks them up from its "local" dir instead of getting the logs via HTTP from the workers.

worker_1 | ValueError: Unable to configure handler 'file.processor': [Errno 13] Permission denied: '/usr/local/airflow/logs/scheduler'

the problem is: The volume now has root:root permissions but of course airflow in puckels container is running under airflow

hmrc87 avatar Feb 13 '18 07:02 hmrc87

*** Log file does not exist: /root/airflow/logs/test_python/print_ct/2018-03-08T18:33:33+00:00/1.log *** Fetching from: http://localhost:8793/log/test_python/print_ct/2018-03-08T18:33:33+00:00/1.log *** Failed to fetch log file from worker. HTTPConnectionPool(host='localhost', port=8793): Max retries exceeded with url: /log/test_python/print_ct/2018-03-08T18:33:33+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd399efa250>: Failed to establish a new connection: [Errno 111] Connection refused',))

Any idea how to fix it

vinaybabunaidu avatar Mar 08 '18 18:03 vinaybabunaidu

My workaround, you could call it solution, is to mount a named-volume between scheduler, worker and webserver for the logfiles so that they appear "local" to the webserver. I am using the celery-executor.

[...]
#for scheduler, worker and webserver
 volumes:
            - ./dags/${DEPLOYMENT_CONFIGURATION}/:/usr/local/airflow/dags
            - ./plugins:/usr/local/airflow/plugins
            - ./helper:/usr/local/airflow/helper
            - airflowlogs:/usr/local/airflow/logs
[...]
volumes:
        airflowlogs: {}

hmrc87 avatar Mar 09 '18 07:03 hmrc87

is there a reason not to share all of /usr/local/airflow between all the workers, scheduler and webserver?

I've got a common nfs share on each of my nodes, so am trying airflow-data:/usr/local/airflow/

SvenDowideit avatar Jun 12 '18 05:06 SvenDowideit

I made the volume sharing worked by using volumes: - ./airflow-logs:/usr/local/airflow/logs

and we have to make sure ./airflow-logs is owned by a user airflow on the host chown airflow: ./airflow-logs

restart the service with docker-compose

harris-chiu avatar Aug 10 '18 23:08 harris-chiu

@hmrc87 or @harris-chiu what kind of size log files do you get with this setup?

omegagi avatar Aug 16 '18 04:08 omegagi

@omegagi it's the regular log files produced for each task in the dag. The good thing of that setup is all logs are shared by all docker container and also persisted on the host machine. When docker-airflow restarted, we still have the previous logs available

harris-chiu avatar Aug 16 '18 17:08 harris-chiu

I had this issue with my installation as well but I realized the issue was because the logs directory after being deleted changed ownership to 'root'. I ran chown -R ec2-user:ec2-user logs/ to fix the ValueError: Unable to configure handler 'processor': [Errno 13] Permission denied: '/usr/local/airflow/logs/scheduler' issue

nyaessuman avatar Jul 23 '19 15:07 nyaessuman

Does anyone have a complete workaround to this problem ? I think I have the same problem, but I'm not able to make it work :/

LvffY avatar Dec 25 '19 16:12 LvffY

After upgrading Airflow to 1.10.6 I am not able to see the log files in UI. I can see them in the worker location but getting 404 file not found when trying to access through UI. I tried using wget from webserver and got the same error.

Log file does not exist: /etc/airflow/logs/xxxx/1.log *** Fetching from: http://x.x.x.x.aws.xx.com:8793/log/xxxx/1.log *** Failed to fetch log file from worker. 404 Client Error: NOT FOUND for url: http://x.x.x.x.aws.xx.com:8793/log/xxxx/1.log

hiding sensitive information.

sahilsapolia avatar Dec 27 '19 23:12 sahilsapolia

I also have this issue. Tried doing a complete deinstall of docker and upgrading to latest version. Not working.

*** Log file does not exist: /usr/local/airflow/logs/testing/Read_Calendar/2020-01-01T16:12:15.203501+00:00/1.log *** Fetching from: http://:8793/log/testing/Read_Calendar/2020-01-01T16:12:15.203501+00:00/1.log *** Failed to fetch log file from worker. Invalid URL 'http://:8793/log/testing/Read_Calendar/2020-01-01T16:12:15.203501+00:00/1.log': No host supplied

Not really sure what's changed here. I didn't make any config changes, the issue is currently only present in my local docker environment, and the config for that environment hasn't changed in weeks. My other dags seem to run fine as well, just this testing one that is having issues.

Config attached airflow-config-01-02-2020.txt

maxfactor1 avatar Jan 02 '20 16:01 maxfactor1

By setting the network to be that of the host, I was able to work around this. This forces docker to assign the same hostname to containers as the underlying host.

# On the worker host
docker run \
  --net=host \
  ...
  airflow worker
# On the master (webserver, scheduler host)
docker run \
  --net=host \
  ...
  airflow scheduler

docker run \
  --net=host \
  ...
  airflow webserver

I now see the container hostnames for task logs reflect the underlying hostnames:

*** Log file does not exist: /usr/local/airflow/logs/example-task/2020-01-05T08:57:24+00:00/1.log
*** Fetching from: http://ip-10-0-1-16.ec2.internal:8793/log/example-task/2020-01-05T08:57:24+00:00/1.log

manur avatar Jan 05 '20 09:01 manur

I passed the

--hostname $HOSTNAME

docker parameter to docker run for the worker. so it will set the hostname for the docker instance.

@tkaymak @villasv

AdimUser avatar Feb 03 '20 09:02 AdimUser

@manur @AdimUser What if there are multiple docker containers running in the same host machine at the port 8793 ?

amith1893 avatar Apr 03 '20 22:04 amith1893

This is what I've added to solve the issue:

  airflow_scheduler:
    hostname: scheduler_host
    networks:
      airflow_net:
        aliases:
          - scheduler_host
  airflow_worker:
    hostname: worker_host
    networks:
      airflow_net:
        aliases:
          - worker_host
networks:
  airflow_net:

Vanuan avatar Apr 12 '20 17:04 Vanuan

I made the volume sharing worked by using volumes:

  • ./airflow-logs:/usr/local/airflow/logs

and we have to make sure ./airflow-logs is owned by a user airflow on the host chown airflow: ./airflow-logs

restart the service with docker-compose

This indeed solved the issue for me (airflow 1.10.10 set up with docker-compose). Many thanks.

The command chown airflow: ./airflow-logs however did not work for me. I received the error chown: invalid spec: ‘airflow:’.

I gave access to my airflow containers through modifying the permissions using chmod instead.

quinten-goens avatar Jun 16 '20 10:06 quinten-goens

I was running on Celery Executer when I faced this error, after reading some answers here I figured my docker-containerized Airflow version 1 could go over the error by me, telling it to store everything you have like "/root/airflow/log.." to a place inside /home where I made it chmod 666. I also had a problem with my worker, it wasn't functioning, so as I was using RabbitMQ as my broker, I had to make it function, both of these changes made the error go away. Now I can read logs on webserver-webui even when code is running on a remote machine.(the very place that was showing me the inability to find or create log file)

Hope this help someone.

AramisN avatar Jul 18 '21 11:07 AramisN