docker-airflow
docker-airflow copied to clipboard
FIX since Airflow 1.10.8 AIRFLOW__ variables are eligible to _CMD behaviour
See pull request on the official Airflow repository: https://github.com/apache/airflow/pull/6801
It was integrated in 1.10.8. The following variables management are concerned by this change in the image entrypoint:
- Core
fernet_keyconfiguration can now be managed withAIRFLOW__CORE__FERNET_KEY_CMDas well as the usualAIRFLOW__CORE__FERNET_KEY - Core
sql_alchemy_connconfiguration can now be managed withAIRFLOW__CORE__SQL_ALCHEMY_CONN_CMDas well as the usualAIRFLOW__CORE__SQL_ALCHEMY_CONN - Celery
broker_urlconfiguration can now be managed withAIRFLOW__CELERY__BROKER_URL_CMDas well as the usualAIRFLOW__CELERY__BROKER_URL
This PR takes this change into account and fixes errors where the REDIS_ and POSTGRES_ and the Fernet Key variables used in the entrypoint were not computed correctly.
would this allow for a secret to be set like so?
services:
airflow:
image: puckel/airflow
secrets:
- fernet-key
environment:
- AIRFLOW__CORE__FERNET_KEY=$(cat /run/secrets/fernet-key)
@dinigo yes, but you have to replace AIRFLOW__CORE__FERNET_KEY with AIRFLOW__CORE__FERNET_KEY_CMD and use the correct volumeMount to make sure the /run/secrets/fernet-key file contains the secret's value.
I'm rebasing your changes with my fork, thanks :)
I have problems using "AIRFLOW__CELERY__BROKER_URL_CMD" (no more entries in google , just this topic)
Is correct to point this value to hole string like: "pyamqp://airflow:airflow@rabbitmq:5672/airflow" in secret?
AIRFLOW__CELERY__BROKER_URL_CMD=$(cat /run/secrets/broker_url) ?
And never gets de value. I'm here in this topic: https://github.com/puckel/docker-airflow/issues/545
@aalemanq from the documentation of airflow v1.10.10
The _cmd config options can also be set using a corresponding environment variable the same way the usual config options can. For example:
export AIRFLOW__CORE__SQL_ALCHEMY_CONN_CMD=bash_command_to_runThe idea behind this is to not store passwords on boxes in plain text files.
The universal order of precedence for all configuration options is as follows:
- set as an environment variable
- set as a command environment variable
- set in airflow.cfg
- command in airflow.cfg
- Airflow’s built in defaults
It could be that this repo is still in the v1.10.9. I don't really know. But you can use the official image apache/airflow:1.10.10 which supports it. The entrypoint changes a little bit though but is easily manageable. You can read about it in the blog post about the release.
Anyway why do you want to store the broker url as a docker secret? Do you use a third party managed redis service which needs passwords? I don't see the point otherwise.
Hello, I have to securize passwords and no password upload to git.. I use rabbitmq.
I try with 1.10.10 and same. I don't know what happens with this #### secrets to apply in airflow. It can't be so complicated to pass secrets in enviroment like other software omg... :(
I I don't know why official doc just put this info:
https://airflow.readthedocs.io/en/stable/howto/set-config.html and not works.
Anybody can pass secrets via enviroments in docker-compose to deploy in swarm????
If you want to get visibility I suggest you file an issue at apache/airflow repo or at StackOverflow (or why not, use both)
@aalemanq
The idea with variables looking like _CMD is the following;
- you store the confidential value in a secret (Swarm of Kubernetes)
- you mount the secret as a file in your Airflow containers
- you set the xxxx_CMD environment variable corresponding to a valid Airflow config as a command that reads the file (for example with cat, or with any mechanism you like best)
Note that not all variables are eligible to _CMD behaviour (see the config documentation). In your case, for the broker endpoint config it's eligible.
This should work with any recent Airflow version (I use 1.10.9), but be aware that the official apache/airflow docker image that is a backport of Airflow 2.0 has got issues with its entrypoint that I fixed in puckel/docker-airflow. I need to take a look at the official image and propose a fix.
Thanks for your repply NBardelot! your work here is awesome so many thanks.
I understand, it is so simple no?
I create my secret in docker-compose and saved in swarm and mounted in container on deploy I apply enviroment in docker compose like:
AIRFLOW__CELERY__BROKER_URL_CMD=$(cat /run/secrets/broker_url)
(In docker swarm, this fail because you need to scape this $ with another $)= AIRFLOW__CELERY__BROKER_URL_CMD=$$(cat /run/secrets/broker_url) If I use env_file, with 1 $, same issue
This not works, and broker url empty= is default=redis and I can't connect against my rabbit using secrets.
If you can I can show you all my workflow, applying secrets in docker-swarm, deploy, and logs with _CMD enviroments on docker-compose.yml applied.
I try with last version 1.10.9 puckel too and CMD never gets the value :(. If I use normal enviroments without secrets it's works
@aalemanq You do not need the $() around the command. The python script in charge of managing the configuration of Airflow already includes the execution of a shell to run the command (see airflow/configuration.py and method run_command if you'd like to take a look at the details).
You can just set AIRFLOW__CELERY__BROKER_URL_CMD="cat /run/secrets/broker_url" and it should work.
In order to debug such a configuration you can:
- run the container using docker-compose
- exec into the container (you'll probably need to force the entrypoint to
--entrypoint /bin/bash) - check the files that are mounted to verify that the secret is correctly present in the file you expect
- check yourself the command you'd expect Airflow to use
eval "$AIRFLOW__CELERY__BROKER_URL_CMD"to test that all is working
Well, I have found several errors that confuse me, thank you very much for your interest and for giving support, I will never tire of repeating it.
As you say, I have applied the environment variable:
AIRFLOW__CELERY__BROKER_URL_CMD="cat /run/secrets/broker_url"
Same error, the variable does not work at the start of airflow.
Debug:
- Docker-compose running with
entrypoint: ["sh", "-c", "sleep 2073600"]and command worker disabled (I'm playing with worker service with rabbitmq and postgres up) - inside container:
root@ac88785b0b0c:/usr/local/airflow# cat /run/secrets/broker_url pyamqp://airflow:airflow@rabbitmq:5672/airflow
First of all, before execute /entrypoint.sh manually, I check if environments are correct:
-If I execute eval "$AIRFLOW__CELERY__BROKER_URL_CMD" I get:
airflow@0e6c1e5913d8:~$ eval "$AIRFLOW__CELERY__BROKER_URL_CMD"
bash: cat /run/secrets/broker_url: No such file or directory
I have to restart docker-compose again with next value in cmd broker_url env:
AIRFLOW__CELERY__BROKER_URL_CMD=cat /run/secrets/broker_url #without [""]
and now, inside container, I have this env working:
airflow@7ebed0b08ca7:~$ eval "$AIRFLOW__CELERY__BROKER_URL_CMD"
pyamqp://airflow:airflow@rabbitmq:5672/airflow
So far everything perfect. but....
The next step is to run the entrypoint.sh manually to simulate the start of thepuckel/docker-airflow:1.10.9
When I run this entrypoint / entrypoint.sh, airflow update pip and avoid my _CMD enviroment :( connecting to default: redis (I imagine that enter in some case ignoring _CMD enviroments)
Mon 04 May 2020 01:26:21 PM UTC - waiting for Redis... 1/20 Mon 04 May 2020 01:26:26 PM UTC - waiting for Redis... 2/20 Mon 04 May 2020 01:26:31 PM UTC - waiting for Redis... 3/20
And it never takes my CMD variable from broker_url in the start using entrypoint
For another hand, if I execute command directly and no entrypoint.sh manually, I get this ¿?
`airflow@7ebed0b08ca7:~$ airflow worker
[2020-05-04 13:29:18,910] {{settings.py:253}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=115
[2020-05-04 13:29:19,669] {{cli_action_loggers.py:107}} WARNING - Failed to log action with (psycopg2.errors.UndefinedTable) relation "log" does not exist
LINE 1: INSERT INTO log (dttm, dag_id, task_id, event, execution_dat...
^
[SQL: INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES (%(dttm)s, %(dag_id)s, %(task_id)s, %(event)s, %(execution_date)s, %(owner)s, %(extra)s) RETURNING log.id]
[parameters: {'dttm': datetime.datetime(2020, 5, 4, 13, 29, 19, 645785, tzinfo=<Timezone [UTC]>), 'dag_id': None, 'task_id': None, 'event': 'cli_worker', 'execution_date': None, 'owner': 'airflow', 'extra': '{"host_name": "7ebed0b08ca7", "full_command": "[\'/usr/local/bin/airflow\', \'worker\']"}'}]
(Background on this error at: http://sqlalche.me/e/f405)
-------------- celery@7ebed0b08ca7 v4.4.0 (cliffs)
--- ***** -----
-- ******* ---- Linux-4.15.0-99-generic-x86_64-with-debian-10.3 2020-05-04 13:29:19
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: airflow.executors.celery_executor:0x7f134c1d4ed0
- ** ---------- .> transport: amqp://airflow:**@rabbitmq:5672/airflow%0A
- ** ---------- .> results: postgresql://airflow:**@postgres/airflow
- *** --- * --- .> concurrency: {min=12, max=16} (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> default exchange=default(direct) key=default
[tasks]
. airflow.executors.celery_executor.execute_command
[2020-05-04 13:29:22,139] {{settings.py:253}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=122
[2020-05-04 13:29:22,935: ERROR/MainProcess] consumer: Cannot connect to amqp://airflow:**@rabbitmq:5672/airflow%0A: Connection.open: (530) NOT_ALLOWED - vhost airflow
not found.
Trying to reconnect...
`
¿?¿? what is this?
amqp://airflow:**@rabbitmq:5672/airflow%0A < ---- ¿?¿ %0A ¿?
It's normal that inside entrypoint of 1.10.9 image there is no reference to enviroments _CMD?
airflow@7ebed0b08ca7:~$ grep -i CMD /entrypoint.sh
airflow@7ebed0b08ca7:~$
edit: Another test... : I'm trying with entrypoint that exist in this topic ( 0f5b8b1 )
Because I don't understand why actually entrypoint in 1.10.9 don't have CMD enviroments. I get same error of my last post. Airflow gets the value but apply "%0A" at the final of sring O_O. But at least. eval "works" and my enviroment exists pointing to secrets.
Apologies for my english and skill, I'm trying to be clear and no disturb u!
Regards!
@aalemanq pleasure to help, I've worked on this so if I can share...
The %0A is a newline character (\n). When you read with cat, the last character is a newline and Airflow thinks it's part of the config.
You can check this with wc -l /run/secrets/broker_url to count the number of lines of your secret file. It should be 1 in order to be OK. If you have 2, then the extra \n is an issue.
Be careful when you create the secret you probably insert a newline at the end of the string without wanting one. Example with the newline (due to echo in this case):
echo "the sensitive data" | docker secret create mysecret -
Without the newline (-n switch of the echo command):
echo -n "the sensitive data" | docker secret create mysecret -
It's normal that inside entrypoint of 1.10.9 image there is no reference to enviroments _CMD?
It depends on the Airflow image. Are you using the image from puckel/docker-airflow (this project) or the image from the Airflow official project?
The puckel/docker-airflow should be OK.
But the official image is not. See the following issue and PR i'm currently proposing to Airflow :
Hello, I'm using puckel/docker-airflow:1.10.9 and if you grep CMD on entrypoint you can't see anything related enviroments CMD inside, it's normal? O_o, I had to override this entrypoint https://github.com/puckel/docker-airflow/pull/503/commits/0f5b8b1df363283cbcc285d03781cd3dca148689 in my compose to avoid airflow change my broker_url for redis. Do you think that I'm in the correct way? It's works....
About secrets...
airflow@8b0ecb633439:~$ wc -l /run/secrets/broker_url 1 /run/secrets/broker_url
Its ok!! O_O,
I create a secret with docker-compose not by hand 3.7 version. For another hand I create manually this secrets with echo -n and use it like external,and IT'S WORKS!!!! It's fking works! I can't believe that , 1 month wasted O_O.
Resume, to setup airflow with CMD env via docker-compose I have to do:
- Use this sintaxis (maybe if airflow put and example in their doc..):
` In enviroments section:
AIRFLOW__CORE__FERNET_KEY_CMD=cat /run/secrets/fernet_key
AIRFLOW__CELERY__BROKER_URL_CMD=cat /run/secrets/broker_url
AIRFLOW__CORE__SQL_ALCHEMY_CONN_CMD=cat /run/secrets/sql_alchemy_conn
AIRFLOW__CELERY__RESULT_BACKEND_CMD=cat /run/secrets/result_backend
`
-
Override entrypoint in my compose using
entrypoint: /entrypoint2.shfrom here https://github.com/puckel/docker-airflow/pull/503/commits/0f5b8b1df363283cbcc285d03781cd3dca148689 because if not, broker_url use default (redis) and not my rabbitmq -
Create broker_url secret by hand and use it like external from compose (another secrets like fernet/sql_alchemy..., works automatically via docker-compose/secret without use external)... if not, error for this character inside secret:
%0A
Really, I'm astonished, When I go to official doc you can read this: https://airflow.readthedocs.io/en/stable/howto/set-config.html and cut your veins trying shit O_O
And if I use official image...bye bye, it's worst, same entrypoint with no CMD O_O, and this guys put in their official docu but entrypoint can't manage it?¿? I don't understand nothing. Sorry for last words talking about my life X)
Regards NBardelot , you save my day, my month,my mind and more things.
You'll need to rebuild the Docker image for puckel/docker-airflow with the commit I propose in this PR in order to make it work. That's the goal of the PR :)
Thanks a lot! Do you think that is the same if I copy entrypoint like now? or I lost some features? If I can save to build an image better for my devs...
Thanks!
PD: thanks, I never read about PR :D
The previous commit concerning the file script/entrypoint.sh is commit a1d70c6 (a commit I made to fix other issues about AIRFLOW__CELERY__BROKER_URL and AIRFLOW__CORE__SQL_ALCHEMY_CONN not being taken into account). It was added in 1.10.8.
So if you use 1.10.9 you can safely replace the whole script using the commit of this PR.
Thanks for stay here ;)
Somebody knows if AIRFLOW__CELERY__FLOWER_BASIC_AUTH_CMD is implemented? I can't see any trace in entrypoint about this environment.
Regards!!
It is not yet implemented. That's why I propose this PR with a commit to implement it.