airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Fix docker compose command in "Running Airflow in Docker"

Open jtommi opened this issue 1 year ago • 11 comments

On the page Running Airflow in Docker
Running docker compose up airflow-init with the provided docker-compose.yaml does not initialize the database.
It will execute the command provided in the docker-compose.yaml followed by /entrypoint, none of which do initialize the database.

What worked for me was to docker compose run airflow-init ("run" instead of "up")

Also (not included in this PR), the message after initialization for me looked like this (Docker-Desktop on Windows 11):

[2024-01-05T09:51:47.658+0000] {override.py:1820} INFO - Added Permission menu access on Permission Pairs to role Admin
[2024-01-05T09:51:47.934+0000] {override.py:1458} INFO - Added user airflow
User "airflow" created with role "Admin"
2.8.0

not like the doc mentions:

airflow-init_1       | Upgrades done
airflow-init_1       | Admin user airflow created
airflow-init_1       | 2.8.0
start_airflow-init_1 exited with code 0

^ Add meaningful description above Read the Pull Request Guidelines for more information. In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

jtommi avatar Jan 05 '24 09:01 jtommi

Also (not included in this PR), the message after initialization for me looked like this (Docker-Desktop on Windows 11):

[2024-01-05T09:51:47.658+0000] {override.py:1820} INFO - Added Permission menu access on Permission Pairs to role Admin [2024-01-05T09:51:47.934+0000] {override.py:1458} INFO - Added user airflow User "airflow" created with role "Admin" 2.8.0

Yes. Because that's how run works. The output you see in the docs is from up command. The run command is more like docker run command and it will generally not start the service as such but it will run container that belongs to the service and usually run a custom command in it (for example it will not open ports associated with the service) - so output of run command is very different than output from up command. This is quite expected.

potiuk avatar Jan 05 '24 12:01 potiuk

And just a general command @jtommi - the fact that something does not work for you - is not always a reason to change things - because it maybe YOU have specific problem which needs to be investigating. But when you develop product you have to think of many people and whenever you do any change you need to consider impact it might on others - that's why we generally have tests for such things and that's why try to do in most portable ways - that we believe is good for others. That's why in order to make such change it's not enough "it works for me" - we need to know why it did not work the other way (Which it should and should be tested) and only if we know why - we should cahnge it (and adapt the tests).

potiuk avatar Jan 05 '24 12:01 potiuk

And also another comment - I am not 100% sure it works now - it might it be it does not and our tests simply did not catch it, it's just we need to understand what's going on (and also fix the tests if that's the case).

potiuk avatar Jan 05 '24 12:01 potiuk

BTW. By a quick look it is quite possible it does not work.

The aiflow-init container runs airflow version which I think COULD in the past initialize the DB as side effect but this could have changed since. It's quite possible that airflow version does not initialize the DB now (because if it did, then it was a side-effect).

If that's the case then the right solution will be to change this command to one that actually initializes the DB rather than changing the documentation. Because it means that the airflow-init service is simply quite badly defined now.

And it seems that our tests do not actually run airflow-init - so the right fix here will be in this case to change our tests to run the init, then verify if the database have been created and only then run the whole compose setup.

potiuk avatar Jan 05 '24 12:01 potiuk

Thanks for all those insights, very helpful. I'm really sorry to have wasted both of your time @potiuk and @Taragolis I was pretty confident because doing docker-compose up airflow-init literally said

airflow_init_1  | ERROR: You need to initialize the database. Please run `airflow db init`. Make sure the command is run using Airflow version 2.5.3.
airflow_init_1  | 2.5.3

... and looking into the command and /entrypoint, nowhere it ran db init (which in hindsight makes sense if airflow version does initialize the DB.

I was testing around with 2.5.3 and 2.8.0 thinking I was modifying the relevant things in the docker-compose file, spoiler, I wasn't. In the end I wasn't able to reproduce the issue in 2.8.0 and for 2.5.3 it failed because I was using _AIRFLOW_DB_MIGRATE: 'true' instead of _AIRFLOW_DB_UPGRADE: 'true'.

That said, the output is still different, but not sure it's worth modifying it, I got:

airflow_init_1  | [2024-01-05T12:52:27.886+0000] {override.py:1458} INFO - Added user airflow
airflow_init_1  | User "airflow" created with role "Admin"
airflow_init_1  | 2.8.0
airflow_init_1  exited with code 0

And also, if all Airflow containers in the docker-compose are dependent on airflow-init, why does the doc specify that you should run it first, since it will run first anyway when you'll do docker compose up

jtommi avatar Jan 05 '24 14:01 jtommi

Those are all good questions - and I think they need proper fixes. I think "why" it's not a good question. Better question is "how we can improve it" because as you see - there are some inconsistencies and this is a good opportunity to propose to improve it (but more comepletely, not by just changing up to run). Would you like to take on that task @jtommi ?

potiuk avatar Jan 05 '24 14:01 potiuk

The aiflow-init container runs airflow version which I think COULD in the past initialize the DB as side effect but this could have changed since.

I guess it still work, and just simple step for operate with airflow's entrypoint environment variables: _AIRFLOW_DB_MIGRATE and
_AIRFLOW_WWW_USER_CREATE, _AIRFLOW_WWW_USER_USERNAME, _AIRFLOW_WWW_USER_PASSWORD

And it seems that our tests do not actually run airflow-init - so the right fix here will be in this case to change our tests to run the init, then verify if the database have been created and only then run the whole compose setup.

It should not explicit run docker-compose up airflow-init because this command required for very old docker compose, something like below 1.29.0 which doesn't support service_completed_successfully depends on instruction

In our tests we run docker compose up -d --wait

https://github.com/apache/airflow/blob/5b4e95065ec860b6ea7f398fabf36ef7492b0970/docker_tests/test_docker_compose_quick_start.py#L140-L142

And according to the dependencies it should run initialise DB after Postgres started and before scheduler/webserver/worker/triggerer

graph TD;
    postgres-->airflow-init;
    redis-->airflow-init;

    airflow-init-->airflow-webserver;
    airflow-init-->airflow-scheduler;
    airflow-init-->airflow-worker;
    airflow-init-->airflow-triggerer;
    airflow-init-->airflow-cli;
    airflow-init-->flower;

    postgres-->airflow-webserver;
    postgres-->airflow-scheduler;
    postgres-->airflow-worker;
    postgres-->airflow-triggerer;
    postgres-->airflow-cli;
    postgres-->flower;

    redis-->airflow-webserver;
    redis-->airflow-scheduler;
    redis-->airflow-worker;
    redis-->airflow-triggerer;
    redis-->airflow-cli;
    postgres-->flower;

Taragolis avatar Jan 05 '24 19:01 Taragolis

And according to the dependencies it should run initialise DB after Postgres started and before scheduler/webserver/worker/triggerer

Ah ... So looks like I am still stuck in that very old docker-compose of ours - so should we just .... remove the separate step?

potiuk avatar Jan 05 '24 19:01 potiuk

I guess so because we have requirements for Docker Compose V2 now.

My personal docker compose file for testing something internal on RC versions of Airflow/Providers also a bit out-date 🤣

BTW, I've tested on on docker compose which released with 2.5.3 and 2.8.0 and it works without any issues with specific this versions.

Airflow 2.5.3 from scratch

❯ curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.5.3/docker-compose.yaml'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11576  100 11576    0     0  54362      0 --:--:-- --:--:-- --:--:-- 54603

❯ docker compose down --volumes --remove-orphans
[+] Running 5/5
 ✔ Container airflow-native-airflow-init-1   Removed                                                                                                                            0.0s 
 ✔ Container airflow-native-postgres-1       Removed                                                                                                                            0.3s 
 ✔ Container airflow-native-redis-1          Removed                                                                                                                            0.3s 
 ✔ Volume airflow-native_postgres-db-volume  Removed                                                                                                                            0.1s 
 ✔ Network airflow-native_default            Removed                                                                                                                            0.1s 

❯ docker compose up airflow-init
[+] Building 0.0s (0/0)                                                                                                                                         docker:desktop-linux
[+] Running 5/4
 ✔ Network airflow-native_default              Created                                                                                                                          0.0s 
 ✔ Volume "airflow-native_postgres-db-volume"  Created                                                                                                                          0.0s 
 ✔ Container airflow-native-redis-1            Created                                                                                                                          0.0s 
 ✔ Container airflow-native-postgres-1         Created                                                                                                                          0.0s 
 ✔ Container airflow-native-airflow-init-1     Created                                                                                                                          0.0s 
Attaching to airflow-native-airflow-init-1
...
airflow-native-airflow-init-1  | [2024-01-05 19:17:11,067] {manager.py:562} INFO - Added Permission can delete on DAGs to role Admin
airflow-native-airflow-init-1  | [2024-01-05 19:17:11,147] {manager.py:212} INFO - Added user airflow
airflow-native-airflow-init-1  | User "airflow" created with role "Admin"
airflow-native-airflow-init-1  | 2.5.3
airflow-native-airflow-init-1 exited with code 0

Upgrade from 2.5.3 to 2.8.0

❯ curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.8.0/docker-compose.yaml'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10940  100 10940    0     0  65989      0 --:--:-- --:--:-- --:--:-- 65903

❯ docker compose up airflow-init
[+] Building 0.0s (0/0)                                                                                                                                         docker:desktop-linux
[+] Running 3/3
 ✔ Container airflow-native-redis-1         Running                                                                                                                             0.0s 
 ✔ Container airflow-native-postgres-1      Running                                                                                                                             0.0s 
 ✔ Container airflow-native-airflow-init-1  Recreated                                                                                                                           0.5s 
Attaching to airflow-native-airflow-init-1
airflow-native-airflow-init-1  | The container is run as root user. For security, consider using a regular user account.
airflow-native-airflow-init-1  | 
airflow-native-airflow-init-1  | DB: postgresql+psycopg2://airflow:***@postgres/airflow
airflow-native-airflow-init-1  | Performing upgrade to the metadata database postgresql+psycopg2://airflow:***@postgres/airflow
airflow-native-airflow-init-1  | [2024-01-05T19:18:36.404+0000] {migration.py:213} INFO - Context impl PostgresqlImpl.
airflow-native-airflow-init-1  | [2024-01-05T19:18:36.405+0000] {migration.py:216} INFO - Will assume transactional DDL.
airflow-native-airflow-init-1  | [2024-01-05T19:18:36.409+0000] {db.py:1615} INFO - Creating tables
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Will assume transactional DDL.
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade 290244fb8b83 -> 6abdffdd4815, add dttm index on log table
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade 6abdffdd4815 -> 98ae134e6fff, Increase length of user identifier columns in ``ab_user`` and ``ab_register_user`` tables
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade 98ae134e6fff -> c804e5c76e3e, Add ``onupdate`` cascade to ``task_map`` table
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade c804e5c76e3e -> 937cbd173ca1, Add index to task_instance table
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade 937cbd173ca1 -> 788397e78828, Add custom_operator_name column
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade 788397e78828 -> 405de8318b3a, add include_deferred column to pool
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade 405de8318b3a -> 375a816bbbf4, add new field 'clear_number' to dagrun
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade 375a816bbbf4 -> f7bf2a57d0a6, Add owner_display_name to (Audit) Log table
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade f7bf2a57d0a6 -> bd5dfbe21f88, Make connection login/password TEXT
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade bd5dfbe21f88 -> 10b52ebd31f7, Add processor_subdir to ImportError.
airflow-native-airflow-init-1  | Database migrating done!
airflow-native-airflow-init-1  | /home/airflow/.local/lib/python3.8/site-packages/flask_limiter/extension.py:336 UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
airflow-native-airflow-init-1  | airflow already exist in the db
airflow-native-airflow-init-1  | 2.8.0
airflow-native-airflow-init-1 exited with code 0

Airflow 2.8.0 from scratch

❯ curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.8.0/docker-compose.yaml'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10940  100 10940    0     0  75015      0 --:--:-- --:--:-- --:--:-- 75448

❯ docker compose down --volumes --remove-orphans
[+] Running 5/5
 ✔ Container airflow-native-airflow-init-1   Removed                                                                                                                            0.0s 
 ✔ Container airflow-native-redis-1          Removed                                                                                                                            0.3s 
 ✔ Container airflow-native-postgres-1       Removed                                                                                                                            0.2s 
 ✔ Volume airflow-native_postgres-db-volume  Removed                                                                                                                            0.1s 
 ✔ Network airflow-native_default            Removed                                                                                                                            0.1s 

❯ docker compose up airflow-init
[+] Building 0.0s (0/0)                                                                                                                                         docker:desktop-linux
[+] Running 5/4
 ✔ Network airflow-native_default              Created                                                                                                                          0.0s 
 ✔ Volume "airflow-native_postgres-db-volume"  Created                                                                                                                          0.0s 
 ✔ Container airflow-native-redis-1            Created                                                                                                                          0.1s 
 ✔ Container airflow-native-postgres-1         Created                                                                                                                          0.1s 
 ✔ Container airflow-native-airflow-init-1     Created                                                                                                                          0.0s 
Attaching to airflow-native-airflow-init-1
...
airflow-native-airflow-init-1  | [2024-01-05T19:21:40.257+0000] {override.py:1820} INFO - Added Permission menu access on Permission Pairs to role Admin
airflow-native-airflow-init-1  | [2024-01-05T19:21:40.852+0000] {override.py:1458} INFO - Added user airflow
airflow-native-airflow-init-1  | User "airflow" created with role "Admin"
airflow-native-airflow-init-1  | 2.8.0
airflow-native-airflow-init-1 exited with code 0

Taragolis avatar Jan 05 '24 19:01 Taragolis

So yeah. in this case I'd simply remove the "init" instructions.

potiuk avatar Jan 05 '24 19:01 potiuk

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Feb 20 '24 00:02 github-actions[bot]