docker-airflow icon indicating copy to clipboard operation
docker-airflow copied to clipboard

How to... Make Airflow fast again.

Open infused-kim opened this issue 5 years ago • 34 comments

Airflow has been constantly using 90% of my server's CPU, but I was able to resolve it and bring it down to around 5-10%. I wanted to document and share it in case someone else has the same issue.

Below is everything I had to do to get airflow to work well on a standard digital ocean droplet with 2gb of ram and 1 vcpu.

If you have any other performance tuning tips, please share.

1. Scheduler Settings

AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL=60 # Prevent airflow from reloading the dags all the time and set. This is the main setting that reduces CPU load in the scheduler
AIRFLOW__SCHEDULER__SCHEDULER_MAX_THREADS=1 # This should be set to (CPU Cores - 1)

2. Fix airflow 1.10 scheduler bug

The AIRFLOW-2895 bug in airflow 1.10, causes high CPU load, because the scheduler keeps looping without a break.

It's already fixed in master and will be included in airflow 1.10.1, but it could take weeks or months until its released. In the meantime this patch solves the issue:

Add this patch file to config/af_1.10_high_cpu.patch:

--- jobs.py.orig    2018-09-08 15:55:03.448834310 +0000
+++ jobs.py     2018-09-08 15:57:02.847751035 +0000
@@ -564,6 +564,7 @@

         self.num_runs = num_runs
         self.run_duration = run_duration
+        self._processor_poll_interval = 1.0

         self.do_pickle = do_pickle
         super(SchedulerJob, self).__init__(*args, **kwargs)
@@ -1724,6 +1725,8 @@
             loop_end_time = time.time()
             self.log.debug("Ran scheduling loop in %.2f seconds",
                            loop_end_time - loop_start_time)
+            self.log.debug("Sleeping for %.2f seconds", self._processor_poll_interval)
+            time.sleep(self._processor_poll_interval)

             # Exit early for a test mode
             if processor_manager.max_runs_reached():

Then create your own Dockerfile that uses this image as the base:

# Custom Dockerfile
FROM puckel/docker-airflow:1.10.0-2

# Install mssql support & dag dependencies
USER root
RUN apt-get update -yqq \
    && apt-get install -y gcc freetds-dev \
    && apt-get install -y git procps \ 
    && apt-get install -y vim \
    && pip install 'pymssql==2.1.1' \
    && pip install apache-airflow[mssql,slack] azure 

# This fixes permission issues on linux. 
# The airflow user should have the same UID as the user running docker on the host system.
# To set it automatically, run 'docker build --build-arg DOCKER_UID=`id -u` --rm -t my-airflow .'
ARG DOCKER_UID
RUN \
    : "${DOCKER_UID:?Build argument DOCKER_UID needs to be set and non-empty.}" \
    && usermod -u ${DOCKER_UID} airflow \
    && echo "Set airflow's uid to ${DOCKER_UID}"

# This fixes high CPU load in airflow 1.10
COPY config/af_1.10_high_cpu.patch /root/af_1.10_high_cpu.patch
RUN patch -d /usr/local/lib/python3.6/site-packages/airflow/ < /root/af_1.10_high_cpu.patch; \
    rm /root/af_1.10_high_cpu.patch

USER airflow

3. RBAC webserver high CPU load

If you upgraded to use the new RBAC webserver UI, you may also notice that the webserver is using a lot of CPU persistently.

For some reason the RBAC interface uses a lot of CPU on startup. If you are running on a low powered server, this can cause a very slow webserver startup and permanently high CPU usage.

I have documented this bug as AIRFLOW-3037. To solve it you can adjust the config:

AIRFLOW__WEBSERVER__WORKERS=2 # 2 * NUM_CPU_CORES + 1
AIRFLOW__WEBSERVER__WORKER_REFRESH_INTERVAL=1800 # Restart workers every 30min instead of 30seconds
AIRFLOW__WEBSERVER__WEB_SERVER_WORKER_TIMEOUT=300 #Kill workers if they don't start within 5min instead of 2min

Enabling RBAC is also buggy, but you can find instructions on how to fix it here.

infused-kim avatar Sep 19 '18 11:09 infused-kim

thanks for link to the airflow issue, 1.10.2 is now out which fixes (2) at least. After upgrading CPU usage has gone down a lot, so not sure if the settings still need changing.

jamesjrg avatar Feb 06 '19 14:02 jamesjrg

We have moved to AirFlow 1.10.2 to resolve the CPU usage, Good thing is that the issue we had, got fixed in our environment. However, we have observed that the DAG's tasks though are getting submitted and shows running on the AirFlow dashboard, but they kind of hold up with actual processing and then kind of appears to remain in the queue for about 60 seconds after that the actual execution happens. Please note that for our use case implementation

  • The AirFlow DAGs are not time dependent i.e. they are not 'Scheduled DAGs' but are triggered via a python code.
  • AirFlow v1.10.2 is being used as a single standalone instalation [executor = LocalExecutor] .

The python code watches a directory for any file(s) that arrive. For any file, it observes, the code triggers the AirFlow DAG. We get bundles of files arriving and so at any given instance, we have scenarios where multiple instance of the same DAGs are getting invoked [ code snippet provided below ]. The DAGs are triggered which in turn has a task that calls a python code to trigger a Kubernetes pod where some file related processing happens. Please find below an excerpt from the DAG code

positional_to_ascii = BashOperator(
                    task_id="uncompress_the_file",
                    bash_command='python3.6 ' + os.path.join(cons.CODE_REPO, '/app/Code/k8Job/create_kubernetes_job.py') + ' POS-PREPROCESSING {{ dag_run.conf["inputfilepath"] }} {{ dag_run.conf["frt_id"]}}',
                    execution_timeout=None,
                    dag=dag)

Once this task completes it triggers another DAGs are triggered which has task that processes data from the output of the previous DAG.

Please find below a few details of our config file parameters which may assist in assessing the root cause.

min_file_process_interval = 60 
dag_dir_list_interval = 300 
max_threads = 2
dag_concurrency = 16
worker_concurrency = 16
max_active_runs_per_dag = 16
parallelism = 32
sql_alchemy_conn = mysql://airflow:fewfw324$gG@someXserver:3306/airflow
executor = LocalExecutor

The DagBag parsing time: 1.305286. Please as well find below the snapshot of the command airflow list_dags -r below

-------------------------------------------------------------------
DagBag loading stats for /root/airflow/dags
-------------------------------------------------------------------
Number of DAGs: 7
Total task number: 23
DagBag parsing time: 1.305286
------------------------------+----------+---------+----------+------------------------------
file                          | duration | dag_num | task_num | dags
------------------------------+----------+---------+----------+------------------------------
/trigger_cleansing.py         | 0.876388 |       1 |        5 | ['trigger_cleansing']
/processing_ebcdic_trigger.py | 0.383038 |       1 |        1 | ['processing_ebcdic_trigger']
/prep_preprocess_dag.py       | 0.015474 |       1 |        6 | ['prep_preprocess_dag']
/prep_scale_dag.py            | 0.012098 |       1 |        6 | ['dataprep_scale_dag']
/mvp.py                       | 0.010832 |       1 |        2 | ['dg_a']
/prep_uncompress_dag.py       | 0.004142 |       1 |        2 | ['prep_unzip_decrypt_dag']
/prep_positional_trigger.py   | 0.003314 |       1 |        1 | ['prep_positional_trigger']
------------------------------+----------+---------+----------+------------------------------

Below is the status of the airflow-scheduler service which is showing multiple processes

systemctl status airflow-scheduler
● airflow-scheduler.service - Airflow scheduler daemon
   Loaded: loaded (/etc/systemd/system/airflow-scheduler.service; enabled; vendor preset: disabled)
   Active: active (running) since Sat 2019-03-09 04:44:29 EST; 33min ago
 Main PID: 37409 (airflow)
   CGroup: /system.slice/airflow-scheduler.service
           ├─37409 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37684 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37685 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37686 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37687 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37688 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37689 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37690 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37691 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37692 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37693 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37694 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37695 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37696 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37697 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37699 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37700 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37701 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37702 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37703 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37704 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37705 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37706 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37707 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37708 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37709 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37710 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37712 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37713 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37714 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37715 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37717 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37718 /usr/bin/python3.6 /bin/airflow scheduler
           └─37722 /usr/bin/python3.6 /bin/airflow scheduler

Now the fact that we have several files comming in the DAGs are constantly being fired and have ample DAG task that gets into a waiting stage. Strangely we though didnt have this issue when we were using v1.9 please advise.

shanit-saha avatar Mar 09 '19 10:03 shanit-saha

I realized that in the 'airflow.cfg' file , setting the value of the 'min_file_process_interval' to zero resolved the problem I reported here.

shanit-saha avatar Mar 11 '19 13:03 shanit-saha

@shanit-saha min_file_process_interval this value should me a big number not zero for better performance .Even I am facing same issue you mentioned above .So many scheduler threads are running in background.I am not sure whether it is an expected behavior.Can you shed some light on this issue?

anooprp avatar Mar 29 '19 09:03 anooprp

@anooprp : I would recommend you to go and take a look at the contents provided in this link Click Here. Moreover please elaborate your problem statement. For us moving to Airflow_v1.10.2 resolved slowness and high CPU usage.

shanit-saha avatar Mar 29 '19 10:03 shanit-saha

EDIT: The problem was mounting our DAGs in EFS. Airflow was maxing out the IO and using all of our burst credits. We manually set the IO speed to 1 mb/s and it sped right up.

We moved to puckel/Airflow-1.10.2 to try and resolve a poor performance we've had in multiple environments. We are running on ECS Airflow 1.10.2 on AWS ECS. Interestingly, the CPU/mem never jump above 80%. The Airflow metadb stays very underutilized as well.

Below I've listed the configuration we're using, the DagBag parsing time plus the detailed execution times from the cProfile output of just running DagBag() in pure Python.

A few of our DAGs import a function from create_subdag_functions.py that returns a DAG we use in 12 DAGs. Most of those DAGs and their corresponding subdags only run on the hour, but 1 DAG / 3 Subdags run every 10 minutes.

max_threads = 2
dag_dir_list_interval = 300 
dag_concurrency = 16
worker_concurrency = 16
max_active_runs_per_dag = 16
parallelism = 32
executor = CeleryExecutor

Some observations:

  • airflow list_dags -r takes a very long time too and runs off the example DAGs even though they are disabled. The time to parse for each DAG will jump around.
  • The duration for each DAG is inconsistent (but it only applies to our DAGs and not the examples)
  • There is usually a big jump in that parsing time. e.g. 5 dags will have durations < 1 and then the next 4 will have durations 20+
  • When we profiled the DagBag() function with cProfile we found that DagBag() spent most of its time in the airflow.utils.dag_processing.list_py_paths function probably due to 50+ sql files in our /usr/local/airflow/dags folder
  • Looking through the Landing Times, the task time jumped an order of magnitude between two specific runs. I've tried to look through logs, etc and there is nothing notable between two runs. I've attached the image at the bottom. This performance loss was in Airflow 1.10.0

Solutions I've tried:

  • increasing/lowering max_threads
  • increasing/eliminating min_file_process_interval
  • clearing the airflow database of all DAGs and reloading
  • shutting down and redeploying the environment
DagBag loading stats for /usr/local/airflow/dags
-------------------------------------------------------------------
Number of DAGs: 42
Total task number: 311
DagBag parsing time: 189.77048399999995
--------------------------------------------+--------------------+---------+----------+------------------------------------------------------------------------------------------------------------
--------------------------------------------+--------------------+---------+----------+------------------------------------------------------------------------------------------------------------
/dag1.py                                    | 60.576728          |       1 |       21 | ['dag1']
/dag2.py                                    | 55.092603999999994 |       1 |       28 | ['dag2']
/dag3.py                                    | 47.997972000000004 |       1 |       17 | ['dag3']
/dag4.py                                    | 22.99313           |       3 |       16 | ['dag4', 'dag4.subdag1', 'dag4.subdag2']
/dag5.py                                    | 0.67               |       1 |       21 | ['dag5']
/dag6.py                                    | 0.652114           |       1 |        9 | ['dag6']
/dag7.py                                    | 0.45368            |       1 |       26 | ['dag7']
/dag8.py                                    | 0.396908           |       5 |       40 | ['dag8', 'dag8.subdag1', 'dag8.subdag2', 'dag8.subdag3', 'dag8.subdag4']
/dag9.py                                    | 0.242012           |       6 |       38 | ['dag9', 'dag9.subdag1', 'dag9.subdag2', 'dag9.subdag3', 'dag9.subdag4', 'dag9.subdag5']
/dag10.py                                   | 0.134342           |       1 |        1 | ['dag10']
/dag11.py                                   | 0.13325            |       2 |        8 | ['dag11', 'dag12.subdag1']
/dag12.py                                   | 0.10562            |       1 |        6 | ['dag12']
/create_subdag_functions.py                 | 0.105292           |       0 |        0 | []
 example_http_operator.py                   | 0.040636           |       1 |        6 | ['example_http_operator']
 example_subdag_operator.py                 | 0.005328           |       3 |       15 | ['example_subdag_operator', 'example_subdag_operator.section-1', 'example_subdag_operator.section-2']
 example_bash_operator.py                   | 0.004052           |       1 |        6 | ['example_bash_operator']
 example_branch_operator.py                 | 0.003444           |       1 |       11 | ['example_branch_operator']
 example_branch_python_dop_operator_3.py    | 0.003418           |       1 |        3 | ['example_branch_dop_operator_v3']
 example_passing_params_via_test_command.py | 0.003222           |       1 |        2 | ['example_passing_params_via_test_command']
 example_skip_dag.py                        | 0.002386           |       1 |        8 | ['example_skip_dag']
 example_trigger_controller_dag.py          | 0.002386           |       1 |        1 | ['example_trigger_controller_dag']
 example_short_circuit_operator.py          | 0.002344           |       1 |        6 | ['example_short_circuit_operator']
 example_python_operator.py                 | 0.002218           |       1 |        6 | ['example_python_operator']
 example_latest_only.py                     | 0.002196           |       1 |        2 | ['latest_only']
 example_latest_only_with_trigger.py        | 0.001848           |       1 |        5 | ['latest_only_with_trigger']
 example_xcom.py                            | 0.001722           |       1 |        3 | ['example_xcom']
 docker_copy_data.py                        | 0.001718           |       0 |        0 | []
 example_trigger_target_dag.py              | 0.001704           |       1 |        2 | ['example_trigger_target_dag']
 tutorial.py                                | 0.00165            |       1 |        3 | ['tutorial']
 test_utils.py                              | 0.001376           |       1 |        1 | ['test_utils']
 example_docker_operator.py                 | 0.00103            |       0 |        0 | []
 subdags/subdag.py                          | 0.001016           |       0 |        0 | []
-------------------------------------------------------------------------------------------------------+--------------------+---------+----------+--------------------------------------------------
-------------------------------------------------------------------
DagBag loading stats for /usr/local/airflow/dags
-------------------------------------------------------------------
Number of DAGs: 42
Total task number: 311
DagBag parsing time: 296.5826819999999
------------------------------+--------------------+---------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
file                          | duration           | dag_num | task_num | dags
------------------------------+--------------------+---------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
/dag1.py                      | 74.819988          |       1 |       21 | ['dag1']
/dag3.py                      | 53.193430000000006 |       1 |       17 | ['dag3']
/dag8.py                      | 34.535742          |       5 |       40 | ['dag8', 'dag8.subdag1', 'dag8.subdag2', 'dag8.subdag3', 'dag8.subdag4']
/dag4.py                      | 21.543944000000003 |       6 |       38 | ['dag9', 'dag9.subdag1', 'dag9.subdag2', 'dag9.subdag3', 'dag9.subdag4', 'dag9.subdag5']
/dag5.py                      | 18.458316000000003 |       3 |       16 | ['dag4', 'dag4.subdag1', 'dag4.subdag2']
/create_subdag_functions.py   | 14.652806000000002 |       0 |        0 | []
/dag7.py                      | 13.051984000000001 |       2 |        8 | ['dag11', 'dag11.subdag1']
/dag8.py                      | 10.02703           |       1 |       21 | ['dag5']
/dag9.py                      | 9.834226000000001  |       1 |        1 | ['dag10']
/dag10.py                     | 9.575258000000002  |       1 |       28 | ['dag2']
/dag11.py                     | 9.418897999999999  |       1 |        9 | ['dag6']
/dag12.py                     | 9.319210000000002  |       1 |        6 | ['dag12']
/dag13.py                     | 8.686964           |       1 |       26 | ['dag7']

Note: removed the example DAGs from the second output for brevity

cProfile output of from airflow.models import DagBag; DagBag():

{{settings.py:174}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800, pid=6740
{{__init__.py:51}} INFO - Using executor SequentialExecutor
{{models.py:273}} INFO - Filling up the DagBag from 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      997  443.441    0.445  443.441    0.445 {built-in method io.open}
      198  186.978    0.944  483.629    2.443 zipfile.py:198(is_zipfile)
      642   65.069    0.101   65.069    0.101 {method 'close' of '_io.BufferedReader' objects}
     1351   45.924    0.034   45.946    0.034 <frozen importlib._bootstrap_external>:830(get_data)
     7916   39.403    0.005   39.403    0.005 {built-in method posix.stat}
      2/1   22.927   11.464  544.419  544.419 dag_processing.py:220(list_py_file_paths)
       33   18.992    0.576  289.797    8.782 models.py:321(process_file)
       22    8.723    0.397    8.723    0.397 {built-in method posix.scandir}
      412    2.379    0.006    2.379    0.006 {built-in method posix.listdir}
        9    1.301    0.145    3.058    0.340 linecache.py:82(updatecache)
 1682/355    0.186    0.000    0.731    0.002 sre_parse.py:470(_parse)
     1255    0.183    0.000    0.183    0.000 {built-in method marshal.loads}
 3092/325    0.143    0.000    0.647    0.002 sre_compile.py:64(_compile)
       59    0.139    0.002    0.139    0.002 {built-in method builtins.compile}
    25270    0.134    0.000    0.210    0.000 sre_parse.py:253(get)
    52266    0.132    0.000    0.132    0.000 {method 'append' of 'list' objects}
4210/4145    0.131    0.000    1.760    0.000 {built-in method builtins.__build_class__}

Airflow performance drop:

Airflow performance drop

sethmenghi avatar Apr 03 '19 19:04 sethmenghi

my min_file_process_interval is set to 0 even then my consecutive dag_runs wait for 90 seconds before starting.

How can I fix that?

abhineetraj avatar Apr 25 '19 00:04 abhineetraj

@abhineetraj Do you have that issue in Airflow 1.10.3? Because I have the same issue and I do run 1.10.3.

I'm in the process of choosing a workflow engine that needs to have a clear GUI for monitoring purposes. Airflow seems to be what we need in terms of user interface, although it's very sluggish. For a simple task like the templated tutorial task (here), I expect it to finish within milliseconds.

Details below.

Here's the script I use for stress testing:

import datetime
import multiprocessing
import requests


url = 'http://airflow.localhost:8080/api/experimental/dags/tutorial/dag_runs'
headers = {
    'Cache-Control': 'no-cache',
    'Content-Type': 'application/json',
}


def test(instance, runs):
    for x in range(runs):
        payload = {
            'replace_microseconds': False,
        }

        response = requests.post(url, headers=headers, json=payload)
        print(response.text)
        if not response.ok:
            return


def run_tests(threads):
    thread_pool = []
    start_batch = datetime.datetime.now()
    print('Test: Calling Airflow example_trigger_controller_dag')
    print('Running test with {} threads'.format(threads))

    for thread in range(threads):
        thread_pool.append(multiprocessing.Process(target=test, args=(thread, 20)))
        thread_pool[thread].start()

    for thread in range(threads):
        thread_pool[thread].join()

    end_batch = datetime.datetime.now()
    return "total duration: {:7.5f}s".format(
        end_batch.timestamp() - start_batch.timestamp()
    )


if __name__ == '__main__':
    result = run_tests(threads=2)
    print(result)

The DAG/pipeline is exactly the same as the first code block in the tutorial I linked above

Here's a random log for the templated task, note the timestamps and the time in the log filename:

*** Reading local file: /root/airflow/logs/tutorial/templated/2019-06-28T07:38:16.058941+00:00/1.log
[2019-06-28 07:39:58,364] {__init__.py:1139} INFO - Dependencies all met for <TaskInstance: tutorial.templated 2019-06-28T07:38:16.058941+00:00 [queued]>
[2019-06-28 07:39:58,395] {__init__.py:1139} INFO - Dependencies all met for <TaskInstance: tutorial.templated 2019-06-28T07:38:16.058941+00:00 [queued]>
[2019-06-28 07:39:58,395] {__init__.py:1353} INFO - 
--------------------------------------------------------------------------------
[2019-06-28 07:39:58,395] {__init__.py:1354} INFO - Starting attempt 1 of 2
[2019-06-28 07:39:58,395] {__init__.py:1355} INFO - 
--------------------------------------------------------------------------------
[2019-06-28 07:39:58,411] {__init__.py:1374} INFO - Executing <Task(BashOperator): templated> on 2019-06-28T07:38:16.058941+00:00
[2019-06-28 07:39:58,411] {base_task_runner.py:119} INFO - Running: ['airflow', 'run', 'tutorial', 'templated', '2019-06-28T07:38:16.058941+00:00', '--job_id', '1412', '--raw', '-sd', '/usr/local/lib/python3.6/site-packages/airflow/example_dags/tutorial.py', '--cfg_path', '/tmp/tmpxo1tgu19']
[2019-06-28 07:40:00,559] {base_task_runner.py:101} INFO - Job 1412: Subtask templated [2019-06-28 07:40:00,559] {settings.py:182} INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800, pid=4338
[2019-06-28 07:40:01,107] {base_task_runner.py:101} INFO - Job 1412: Subtask templated [2019-06-28 07:40:01,106] {__init__.py:51} INFO - Using executor CeleryExecutor
[2019-06-28 07:40:01,716] {base_task_runner.py:101} INFO - Job 1412: Subtask templated [2019-06-28 07:40:01,714] {__init__.py:305} INFO - Filling up the DagBag from /usr/local/lib/python3.6/site-packages/airflow/example_dags/tutorial.py
[2019-06-28 07:40:01,786] {base_task_runner.py:101} INFO - Job 1412: Subtask templated [2019-06-28 07:40:01,784] {cli.py:517} INFO - Running <TaskInstance: tutorial.templated 2019-06-28T07:38:16.058941+00:00 [running]> on host 3063ba2ca0c5
[2019-06-28 07:40:01,887] {bash_operator.py:81} INFO - Tmp dir root location: 
 /tmp
[2019-06-28 07:40:01,888] {bash_operator.py:90} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_ID=tutorial
AIRFLOW_CTX_TASK_ID=templated
AIRFLOW_CTX_EXECUTION_DATE=2019-06-28T07:38:16.058941+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2019-06-28T07:38:16.058941+00:00
[2019-06-28 07:40:01,888] {bash_operator.py:104} INFO - Temporary script location: /tmp/airflowtmp_umua3_o/templatedr3ouai12
[2019-06-28 07:40:01,888] {bash_operator.py:114} INFO - Running command: 

    echo "2019-06-28"
    echo "2019-07-05"
    echo "Parameter I passed in"

    echo "2019-06-28"
    echo "2019-07-05"
    echo "Parameter I passed in"

    echo "2019-06-28"
    echo "2019-07-05"
    echo "Parameter I passed in"

    echo "2019-06-28"
    echo "2019-07-05"
    echo "Parameter I passed in"

    echo "2019-06-28"
    echo "2019-07-05"
    echo "Parameter I passed in"

[2019-06-28 07:40:01,945] {bash_operator.py:123} INFO - Output:
[2019-06-28 07:40:01,947] {bash_operator.py:127} INFO - 2019-06-28
[2019-06-28 07:40:01,947] {bash_operator.py:127} INFO - 2019-07-05
[2019-06-28 07:40:01,947] {bash_operator.py:127} INFO - Parameter I passed in
[2019-06-28 07:40:01,947] {bash_operator.py:127} INFO - 2019-06-28
[2019-06-28 07:40:01,947] {bash_operator.py:127} INFO - 2019-07-05
[2019-06-28 07:40:01,948] {bash_operator.py:127} INFO - Parameter I passed in
[2019-06-28 07:40:01,948] {bash_operator.py:127} INFO - 2019-06-28
[2019-06-28 07:40:01,948] {bash_operator.py:127} INFO - 2019-07-05
[2019-06-28 07:40:01,948] {bash_operator.py:127} INFO - Parameter I passed in
[2019-06-28 07:40:01,948] {bash_operator.py:127} INFO - 2019-06-28
[2019-06-28 07:40:01,948] {bash_operator.py:127} INFO - 2019-07-05
[2019-06-28 07:40:01,948] {bash_operator.py:127} INFO - Parameter I passed in
[2019-06-28 07:40:01,949] {bash_operator.py:127} INFO - 2019-06-28
[2019-06-28 07:40:01,949] {bash_operator.py:127} INFO - 2019-07-05
[2019-06-28 07:40:01,949] {bash_operator.py:127} INFO - Parameter I passed in
[2019-06-28 07:40:01,949] {bash_operator.py:131} INFO - Command exited with return code 0
[2019-06-28 07:40:03,358] {logging_mixin.py:95} INFO - [2019-06-28 07:40:03,356] {jobs.py:2562} INFO - Task exited with return code 0

The people at StackOverflow point out that Airflow is not meant as a real-time scheduling engine, which is something I should have realized sooner.

jaharkema avatar Jun 28 '19 08:06 jaharkema

Hi, I'm a bit of a newbie, but I've been recently tasked to containerize Airflow for our company. I've been running into some performance issues and observed something of interest. Mainly... min_file_process_interval is documented to control when a new/updated file gets picked up by Airflow, but it also seems to control when a queued job is executed. I've tried changing this value to see if this is the case, and consistently I've found this config to be correlated to job queue times.

This is a bit of a problem at larger scale, since if you have many dags, the cost of dag refresh becomes more expensive. However, if you have many dags, you also want shorter intervals where jobs can be picked up if workers are available. Since one config controls both functionalities, you have to make a compromise somewhere, and this compromise could be a deal breaker if you are resource constrained, such as using AWS Fargate (max 4vCPU per task).

I am considering a few solutions... but before I do, has anyone scaled Airflow with 100+ dags using this fargate solution?

edkimsg avatar Jul 12 '19 15:07 edkimsg

Hi, I read the following:

Airflow Variables are stored in Metadata Database, so any call to variables would mean a connection to Metadata DB. Your DAG files are parsed every X seconds. Using a large number of variable in your DAG (and worse in default_args) may mean you might end up saturating the number of allowed connections to your database. To avoid this situation, you can either just use a single Airflow variable with JSON value. As an Airflow variable can contain JSON value, you can store all your DAG configuration inside a single variable as shown in the image below

Can't find the settings for variable cache https://github.com/apache/airflow/blob/master/airflow/config_templates/default_airflow.cfg

weldpua2008 avatar Aug 01 '19 07:08 weldpua2008

Guys - i must be missing something here with Airflow performance (version 1.10-5 and 4 as well ) . i have made a very simple test (all with celery executor) - running the example_dag (the one with 12 bash operators that sleep random of 30 seconds) when i run 100 instances of this dag (through rest API) - the scheduler is stuck after 3 minutes.

  • it does not matter if i run locally with celery executor - or in astronomer cloud with several workers.
  • if does not matter how much power i give to the scheduler/ workers.
  • i have played with every possible performance param it looks like the scheduler just can't handle 1200 tasks fired at it at once.

anyone can help understand - either airflow is not working , or i'm missing something critical

shlomiken avatar Nov 27 '19 11:11 shlomiken

I want to add another simple test that does not succeed and cause the scheduler to halt. running the tutorial dag , as its defined back- fill of 2015 also does not finish successfully and cause the scheduler to stop scheduling task. i verified this both locally with celery executor (running 1 or more workers) and on astronomer cloud (with 1 or more workers)

shlomiken avatar Nov 28 '19 15:11 shlomiken

@shlomiken are you working with @ashb directly on this issue?

ryw avatar Feb 03 '20 15:02 ryw

@shlomiken are you working with @ashb directly on this issue?

Sort of yes . he did some scheduler improvements that suppose to fix this , but it didn't work. so i guess we continue to work on this.

shlomiken avatar Feb 03 '20 16:02 shlomiken

Somehow related as I landed on this issue. I am running this on my MAC and the problem I was having was the default RAM allocation settings in the docker desktop for os X. Default is 2gb which was making everything slow and unresponsive.

Make this higher and everything is better again. Hopefully this can help someone.

arocketman avatar Feb 07 '20 19:02 arocketman

@shlomiken is there a fix/workaround for this issue?

gokhanoner avatar Oct 22 '20 17:10 gokhanoner

I'd recommend trying Airflow 2 Alpha, where scheduler lag has been decreased significantly. Check out the alpha announcement on the mailing list.

We expect beta in a couple of weeks, and full release of Airflow 2 before end of year (assuming everything goes as planned).

ryw avatar Oct 22 '20 17:10 ryw

@ryw Thanks, but we cannot wait till EOY, since its a pressing issue for us. Lets see what we can do as a workaround. Any suggestion?

gokhanoner avatar Oct 22 '20 18:10 gokhanoner

@gokhanoner Luigi inside python operator - https://issues.apache.org/jira/browse/AIRFLOW-6920

tooptoop4 avatar Oct 22 '20 18:10 tooptoop4

@tooptoop4 would you test your 200 sequential tasks DAG with Airflow 2 Alpha? Would love to see how close we got to your Luigi numbers.

Also would love to chat w/ you sometime re: the control-m comparisons you put into that JIRA - send me email [email protected] if interested.

ryw avatar Oct 22 '20 19:10 ryw

@ryw time has not changed much :(

tooptoop4 avatar Oct 23 '20 23:10 tooptoop4

@tooptoop4 can you share details? Would love to have @ashb analyze

ryw avatar Oct 23 '20 23:10 ryw

task times:

dag exec_date task state start_date end_date looper3 2020-10-23 21:17:13+00:00 op1 success 2020-10-23 21:17:19.463690+00:00 2020-10-23 21:17:19.719010+00:00 looper3 2020-10-23 21:17:13+00:00 op2 success 2020-10-23 21:17:24.960760+00:00 2020-10-23 21:17:25.363672+00:00 looper3 2020-10-23 21:17:13+00:00 op3 success 2020-10-23 21:17:29.391043+00:00 2020-10-23 21:17:29.648374+00:00 looper3 2020-10-23 21:17:13+00:00 op4 success 2020-10-23 21:17:34.309827+00:00 2020-10-23 21:17:34.552993+00:00 looper3 2020-10-23 21:17:13+00:00 op5 success 2020-10-23 21:17:39.645622+00:00 2020-10-23 21:17:39.941286+00:00 looper3 2020-10-23 21:17:13+00:00 op6 success 2020-10-23 21:17:43.316295+00:00 2020-10-23 21:17:43.509827+00:00 looper3 2020-10-23 21:17:13+00:00 op7 success 2020-10-23 21:17:47.796291+00:00 2020-10-23 21:17:48.069386+00:00 looper3 2020-10-23 21:17:13+00:00 op8 success 2020-10-23 21:17:52.253787+00:00 2020-10-23 21:17:52.748797+00:00 looper3 2020-10-23 21:17:13+00:00 op9 success 2020-10-23 21:17:57.365876+00:00 2020-10-23 21:17:57.626021+00:00 looper3 2020-10-23 21:17:13+00:00 op10 success 2020-10-23 21:18:01.767908+00:00 2020-10-23 21:18:02.082495+00:00 looper3 2020-10-23 21:17:13+00:00 op11 success 2020-10-23 21:18:07.664927+00:00 2020-10-23 21:18:07.912152+00:00 looper3 2020-10-23 21:17:13+00:00 op12 success 2020-10-23 21:18:12.574289+00:00 2020-10-23 21:18:12.883255+00:00 looper3 2020-10-23 21:17:13+00:00 op13 success 2020-10-23 21:18:16.605488+00:00 2020-10-23 21:18:16.880311+00:00 looper3 2020-10-23 21:17:13+00:00 op14 success 2020-10-23 21:18:20.920843+00:00 2020-10-23 21:18:21.211193+00:00 looper3 2020-10-23 21:17:13+00:00 op15 success 2020-10-23 21:18:24.799189+00:00 2020-10-23 21:18:25.031451+00:00 looper3 2020-10-23 21:17:13+00:00 op16 success 2020-10-23 21:18:28.430412+00:00 2020-10-23 21:18:28.637141+00:00 looper3 2020-10-23 21:17:13+00:00 op17 success 2020-10-23 21:18:32.405966+00:00 2020-10-23 21:18:32.699885+00:00 looper3 2020-10-23 21:17:13+00:00 op18 success 2020-10-23 21:18:37.411001+00:00 2020-10-23 21:18:37.814134+00:00 looper3 2020-10-23 21:17:13+00:00 op19 success 2020-10-23 21:18:43.072040+00:00 2020-10-23 21:18:43.402158+00:00 looper3 2020-10-23 21:17:13+00:00 op20 success 2020-10-23 21:18:46.673466+00:00 2020-10-23 21:18:46.883575+00:00 looper3 2020-10-23 21:17:13+00:00 op21 success 2020-10-23 21:18:51.438136+00:00 2020-10-23 21:18:51.704786+00:00 looper3 2020-10-23 21:17:13+00:00 op22 success 2020-10-23 21:18:57.104456+00:00 2020-10-23 21:18:57.420892+00:00 looper3 2020-10-23 21:17:13+00:00 op23 success 2020-10-23 21:19:03.474396+00:00 2020-10-23 21:19:03.650751+00:00 looper3 2020-10-23 21:17:13+00:00 op24 success 2020-10-23 21:19:07.285970+00:00 2020-10-23 21:19:07.577605+00:00 looper3 2020-10-23 21:17:13+00:00 op25 success 2020-10-23 21:19:14.311066+00:00 2020-10-23 21:19:14.647545+00:00 looper3 2020-10-23 21:17:13+00:00 op26 success 2020-10-23 21:19:19.968882+00:00 2020-10-23 21:19:20.353505+00:00 looper3 2020-10-23 21:17:13+00:00 op27 success 2020-10-23 21:19:25.210991+00:00 2020-10-23 21:19:25.485681+00:00 looper3 2020-10-23 21:17:13+00:00 op28 success 2020-10-23 21:19:30.683534+00:00 2020-10-23 21:19:30.979025+00:00 looper3 2020-10-23 21:17:13+00:00 op29 success 2020-10-23 21:19:36.380478+00:00 2020-10-23 21:19:36.613557+00:00 looper3 2020-10-23 21:17:13+00:00 op30 success 2020-10-23 21:19:40.011102+00:00 2020-10-23 21:19:40.205918+00:00 looper3 2020-10-23 21:17:13+00:00 op31 success 2020-10-23 21:19:43.770553+00:00 2020-10-23 21:19:44.057588+00:00 looper3 2020-10-23 21:17:13+00:00 op32 success 2020-10-23 21:19:48.650588+00:00 2020-10-23 21:19:49.260186+00:00 looper3 2020-10-23 21:17:13+00:00 op33 success 2020-10-23 21:19:54.010713+00:00 2020-10-23 21:19:54.270141+00:00 looper3 2020-10-23 21:17:13+00:00 op34 success 2020-10-23 21:19:59.094805+00:00 2020-10-23 21:19:59.386073+00:00 looper3 2020-10-23 21:17:13+00:00 op35 success 2020-10-23 21:20:03.830509+00:00 2020-10-23 21:20:04.156073+00:00 looper3 2020-10-23 21:17:13+00:00 op36 success 2020-10-23 21:20:09.495010+00:00 2020-10-23 21:20:09.726999+00:00 looper3 2020-10-23 21:17:13+00:00 op37 success 2020-10-23 21:20:13.457526+00:00 2020-10-23 21:20:13.752337+00:00 looper3 2020-10-23 21:17:13+00:00 op38 success 2020-10-23 21:20:18.693657+00:00 2020-10-23 21:20:18.928057+00:00 looper3 2020-10-23 21:17:13+00:00 op39 success 2020-10-23 21:20:24.232369+00:00 2020-10-23 21:20:24.427604+00:00 looper3 2020-10-23 21:17:13+00:00 op40 success 2020-10-23 21:20:28.136594+00:00 2020-10-23 21:20:28.407731+00:00 looper3 2020-10-23 21:17:13+00:00 op41 success 2020-10-23 21:20:32.997788+00:00 2020-10-23 21:20:33.305490+00:00 looper3 2020-10-23 21:17:13+00:00 op42 success 2020-10-23 21:20:38.126380+00:00 2020-10-23 21:20:38.477851+00:00 looper3 2020-10-23 21:17:13+00:00 op43 success 2020-10-23 21:20:42.962789+00:00 2020-10-23 21:20:43.161966+00:00 looper3 2020-10-23 21:17:13+00:00 op44 success 2020-10-23 21:20:46.534901+00:00 2020-10-23 21:20:46.819542+00:00 looper3 2020-10-23 21:17:13+00:00 op45 success 2020-10-23 21:20:51.375116+00:00 2020-10-23 21:20:51.700741+00:00 looper3 2020-10-23 21:17:13+00:00 op46 success 2020-10-23 21:20:55.962618+00:00 2020-10-23 21:20:56.261827+00:00 looper3 2020-10-23 21:17:13+00:00 op47 success 2020-10-23 21:21:00.624147+00:00 2020-10-23 21:21:00.781522+00:00 looper3 2020-10-23 21:17:13+00:00 op48 success 2020-10-23 21:21:03.712178+00:00 2020-10-23 21:21:03.887514+00:00 looper3 2020-10-23 21:17:13+00:00 op49 success 2020-10-23 21:21:07.939267+00:00 2020-10-23 21:21:08.245040+00:00 looper3 2020-10-23 21:17:13+00:00 op50 success 2020-10-23 21:21:11.903376+00:00 2020-10-23 21:21:12.176705+00:00 looper3 2020-10-23 21:17:13+00:00 op51 success 2020-10-23 21:21:17.017149+00:00 2020-10-23 21:21:17.223379+00:00 looper3 2020-10-23 21:17:13+00:00 op52 success 2020-10-23 21:21:22.373650+00:00 2020-10-23 21:21:22.731415+00:00 looper3 2020-10-23 21:17:13+00:00 op53 success 2020-10-23 21:21:27.436020+00:00 2020-10-23 21:21:27.635883+00:00 looper3 2020-10-23 21:17:13+00:00 op54 success 2020-10-23 21:21:32.181431+00:00 2020-10-23 21:21:32.460210+00:00 looper3 2020-10-23 21:17:13+00:00 op55 success 2020-10-23 21:21:36.289295+00:00 2020-10-23 21:21:36.553487+00:00 looper3 2020-10-23 21:17:13+00:00 op56 success 2020-10-23 21:21:41.356729+00:00 2020-10-23 21:21:41.602672+00:00 looper3 2020-10-23 21:17:13+00:00 op57 success 2020-10-23 21:21:46.468041+00:00 2020-10-23 21:21:46.776191+00:00 looper3 2020-10-23 21:17:13+00:00 op58 success 2020-10-23 21:21:52.005937+00:00 2020-10-23 21:21:52.366119+00:00 looper3 2020-10-23 21:17:13+00:00 op59 success 2020-10-23 21:21:59.200813+00:00 2020-10-23 21:21:59.529484+00:00 looper3 2020-10-23 21:17:13+00:00 op60 success 2020-10-23 21:22:04.505996+00:00 2020-10-23 21:22:04.716499+00:00 looper3 2020-10-23 21:17:13+00:00 op61 success 2020-10-23 21:22:07.878070+00:00 2020-10-23 21:22:08.175464+00:00 looper3 2020-10-23 21:17:13+00:00 op62 success 2020-10-23 21:22:11.967519+00:00 2020-10-23 21:22:12.184914+00:00 looper3 2020-10-23 21:17:13+00:00 op63 success 2020-10-23 21:22:15.936316+00:00 2020-10-23 21:22:16.329286+00:00 looper3 2020-10-23 21:17:13+00:00 op64 success 2020-10-23 21:22:19.653030+00:00 2020-10-23 21:22:19.836370+00:00 looper3 2020-10-23 21:17:13+00:00 op65 success 2020-10-23 21:22:23.217614+00:00 2020-10-23 21:22:23.461981+00:00 looper3 2020-10-23 21:17:13+00:00 op66 success 2020-10-23 21:22:26.985240+00:00 2020-10-23 21:22:27.301994+00:00 looper3 2020-10-23 21:17:13+00:00 op67 success 2020-10-23 21:22:30.693659+00:00 2020-10-23 21:22:30.895147+00:00 looper3 2020-10-23 21:17:13+00:00 op68 success 2020-10-23 21:22:33.517798+00:00 2020-10-23 21:22:33.750180+00:00 looper3 2020-10-23 21:17:13+00:00 op69 success 2020-10-23 21:22:38.565010+00:00 2020-10-23 21:22:38.845123+00:00 looper3 2020-10-23 21:17:13+00:00 op70 success 2020-10-23 21:22:43.183534+00:00 2020-10-23 21:22:43.511245+00:00 looper3 2020-10-23 21:17:13+00:00 op71 success 2020-10-23 21:22:46.765558+00:00 2020-10-23 21:22:46.999526+00:00 looper3 2020-10-23 21:17:13+00:00 op72 success 2020-10-23 21:22:50.910168+00:00 2020-10-23 21:22:51.162549+00:00 looper3 2020-10-23 21:17:13+00:00 op73 success 2020-10-23 21:22:56.099148+00:00 2020-10-23 21:22:56.594858+00:00 looper3 2020-10-23 21:17:13+00:00 op74 success 2020-10-23 21:23:01.855283+00:00 2020-10-23 21:23:02.300158+00:00 looper3 2020-10-23 21:17:13+00:00 op75 success 2020-10-23 21:23:07.264790+00:00 2020-10-23 21:23:07.665911+00:00 looper3 2020-10-23 21:17:13+00:00 op76 success 2020-10-23 21:23:13.564173+00:00 2020-10-23 21:23:13.921679+00:00 looper3 2020-10-23 21:17:13+00:00 op77 success 2020-10-23 21:23:18.883021+00:00 2020-10-23 21:23:19.129951+00:00 looper3 2020-10-23 21:17:13+00:00 op78 success 2020-10-23 21:23:24.336919+00:00 2020-10-23 21:23:24.652197+00:00 looper3 2020-10-23 21:17:13+00:00 op79 success 2020-10-23 21:23:28.090517+00:00 2020-10-23 21:23:28.355554+00:00 looper3 2020-10-23 21:17:13+00:00 op80 success 2020-10-23 21:23:31.630318+00:00 2020-10-23 21:23:31.901986+00:00 looper3 2020-10-23 21:17:13+00:00 op81 success 2020-10-23 21:23:35.593917+00:00 2020-10-23 21:23:35.872974+00:00 looper3 2020-10-23 21:17:13+00:00 op82 success 2020-10-23 21:23:39.972079+00:00 2020-10-23 21:23:40.189390+00:00 looper3 2020-10-23 21:17:13+00:00 op83 success 2020-10-23 21:23:43.694509+00:00 2020-10-23 21:23:43.863288+00:00 looper3 2020-10-23 21:17:13+00:00 op84 success 2020-10-23 21:23:48.705048+00:00 2020-10-23 21:23:48.952877+00:00 looper3 2020-10-23 21:17:13+00:00 op85 success 2020-10-23 21:23:53.550148+00:00 2020-10-23 21:23:54.027349+00:00 looper3 2020-10-23 21:17:13+00:00 op86 success 2020-10-23 21:23:57.413361+00:00 2020-10-23 21:23:57.601258+00:00 looper3 2020-10-23 21:17:13+00:00 op87 success 2020-10-23 21:24:03.468576+00:00 2020-10-23 21:24:03.911528+00:00 looper3 2020-10-23 21:17:13+00:00 op88 success 2020-10-23 21:24:09.937031+00:00 2020-10-23 21:24:10.390241+00:00 looper3 2020-10-23 21:17:13+00:00 op89 success 2020-10-23 21:24:15.443923+00:00 2020-10-23 21:24:15.630041+00:00 looper3 2020-10-23 21:17:13+00:00 op90 success 2020-10-23 21:24:20.183142+00:00 2020-10-23 21:24:20.496298+00:00 looper3 2020-10-23 21:17:13+00:00 op91 success 2020-10-23 21:24:27.117280+00:00 2020-10-23 21:24:27.528027+00:00 looper3 2020-10-23 21:17:13+00:00 op92 success 2020-10-23 21:24:33.003426+00:00 2020-10-23 21:24:33.431958+00:00 looper3 2020-10-23 21:17:13+00:00 op93 success 2020-10-23 21:24:37.378188+00:00 2020-10-23 21:24:37.623238+00:00 looper3 2020-10-23 21:17:13+00:00 op94 success 2020-10-23 21:24:42.427742+00:00 2020-10-23 21:24:42.834105+00:00 looper3 2020-10-23 21:17:13+00:00 op95 success 2020-10-23 21:24:47.568129+00:00 2020-10-23 21:24:47.795832+00:00 looper3 2020-10-23 21:17:13+00:00 op96 success 2020-10-23 21:24:52.136687+00:00 2020-10-23 21:24:52.311984+00:00 looper3 2020-10-23 21:17:13+00:00 op97 success 2020-10-23 21:24:55.718066+00:00 2020-10-23 21:24:55.905306+00:00 looper3 2020-10-23 21:17:13+00:00 op98 success 2020-10-23 21:24:59.958947+00:00 2020-10-23 21:25:00.165369+00:00 looper3 2020-10-23 21:17:13+00:00 op99 success 2020-10-23 21:25:04.440921+00:00 2020-10-23 21:25:04.723288+00:00 looper3 2020-10-23 21:17:13+00:00 op100 success 2020-10-23 21:25:09.751764+00:00 2020-10-23 21:25:10.037638+00:00 looper3 2020-10-23 21:17:13+00:00 op101 success 2020-10-23 21:25:19.304804+00:00 2020-10-23 21:25:19.912870+00:00 looper3 2020-10-23 21:17:13+00:00 op102 success 2020-10-23 21:25:27.410281+00:00 2020-10-23 21:25:27.735394+00:00 looper3 2020-10-23 21:17:13+00:00 op103 success 2020-10-23 21:25:35.689061+00:00 2020-10-23 21:25:36.013752+00:00 looper3 2020-10-23 21:17:13+00:00 op104 success 2020-10-23 21:25:41.886769+00:00 2020-10-23 21:25:42.320773+00:00 looper3 2020-10-23 21:17:13+00:00 op105 success 2020-10-23 21:25:47.991648+00:00 2020-10-23 21:25:48.394504+00:00 looper3 2020-10-23 21:17:13+00:00 op106 success 2020-10-23 21:25:54.393255+00:00 2020-10-23 21:25:54.610057+00:00 looper3 2020-10-23 21:17:13+00:00 op107 success 2020-10-23 21:25:59.782844+00:00 2020-10-23 21:26:00.196865+00:00 looper3 2020-10-23 21:17:13+00:00 op108 success 2020-10-23 21:26:07.593140+00:00 2020-10-23 21:26:07.861633+00:00 looper3 2020-10-23 21:17:13+00:00 op109 success 2020-10-23 21:26:14.494754+00:00 2020-10-23 21:26:14.813258+00:00 looper3 2020-10-23 21:17:13+00:00 op110 success 2020-10-23 21:26:21.374982+00:00 2020-10-23 21:26:22.091610+00:00 looper3 2020-10-23 21:17:13+00:00 op111 success 2020-10-23 21:26:28.164956+00:00 2020-10-23 21:26:28.502897+00:00 looper3 2020-10-23 21:17:13+00:00 op112 success 2020-10-23 21:26:35.140168+00:00 2020-10-23 21:26:35.485512+00:00 looper3 2020-10-23 21:17:13+00:00 op113 success 2020-10-23 21:26:39.554502+00:00 2020-10-23 21:26:39.961489+00:00 looper3 2020-10-23 21:17:13+00:00 op114 success 2020-10-23 21:26:44.930419+00:00 2020-10-23 21:26:45.157095+00:00 looper3 2020-10-23 21:17:13+00:00 op115 success 2020-10-23 21:26:49.535788+00:00 2020-10-23 21:26:49.874870+00:00 looper3 2020-10-23 21:17:13+00:00 op116 success 2020-10-23 21:26:56.055537+00:00 2020-10-23 21:26:56.308052+00:00 looper3 2020-10-23 21:17:13+00:00 op117 success 2020-10-23 21:27:02.425583+00:00 2020-10-23 21:27:03.127741+00:00 looper3 2020-10-23 21:17:13+00:00 op118 success 2020-10-23 21:27:08.836306+00:00 2020-10-23 21:27:09.153191+00:00 looper3 2020-10-23 21:17:13+00:00 op119 success 2020-10-23 21:27:13.895605+00:00 2020-10-23 21:27:14.281990+00:00 looper3 2020-10-23 21:17:13+00:00 op120 success 2020-10-23 21:27:18.584650+00:00 2020-10-23 21:27:18.822823+00:00 looper3 2020-10-23 21:17:13+00:00 op121 success 2020-10-23 21:27:22.664722+00:00 2020-10-23 21:27:23.014480+00:00 looper3 2020-10-23 21:17:13+00:00 op122 success 2020-10-23 21:27:26.712924+00:00 2020-10-23 21:27:26.972650+00:00 looper3 2020-10-23 21:17:13+00:00 op123 success 2020-10-23 21:27:32.036956+00:00 2020-10-23 21:27:32.361320+00:00 looper3 2020-10-23 21:17:13+00:00 op124 success 2020-10-23 21:27:36.810894+00:00 2020-10-23 21:27:37.182029+00:00 looper3 2020-10-23 21:17:13+00:00 op125 success 2020-10-23 21:27:42.851365+00:00 2020-10-23 21:27:43.380218+00:00 looper3 2020-10-23 21:17:13+00:00 op126 success 2020-10-23 21:27:48.310599+00:00 2020-10-23 21:27:48.577403+00:00 looper3 2020-10-23 21:17:13+00:00 op127 success 2020-10-23 21:27:53.211326+00:00 2020-10-23 21:27:53.439337+00:00 looper3 2020-10-23 21:17:13+00:00 op128 success 2020-10-23 21:27:58.002698+00:00 2020-10-23 21:27:58.290021+00:00 looper3 2020-10-23 21:17:13+00:00 op129 success 2020-10-23 21:28:03.485333+00:00 2020-10-23 21:28:03.762078+00:00 looper3 2020-10-23 21:17:13+00:00 op130 success 2020-10-23 21:28:09.683348+00:00 2020-10-23 21:28:09.929574+00:00 looper3 2020-10-23 21:17:13+00:00 op131 success 2020-10-23 21:28:16.570294+00:00 2020-10-23 21:28:17.147087+00:00 looper3 2020-10-23 21:17:13+00:00 op132 success 2020-10-23 21:28:23.927208+00:00 2020-10-23 21:28:24.181573+00:00 looper3 2020-10-23 21:17:13+00:00 op133 success 2020-10-23 21:28:28.708544+00:00 2020-10-23 21:28:29.076826+00:00 looper3 2020-10-23 21:17:13+00:00 op134 success 2020-10-23 21:28:38.620420+00:00 2020-10-23 21:28:38.924474+00:00 looper3 2020-10-23 21:17:13+00:00 op135 success 2020-10-23 21:28:45.489772+00:00 2020-10-23 21:28:45.917260+00:00 looper3 2020-10-23 21:17:13+00:00 op136 success 2020-10-23 21:28:52.430764+00:00 2020-10-23 21:28:52.815847+00:00 looper3 2020-10-23 21:17:13+00:00 op137 success 2020-10-23 21:28:58.523439+00:00 2020-10-23 21:28:58.970640+00:00 looper3 2020-10-23 21:17:13+00:00 op138 success 2020-10-23 21:29:05.590208+00:00 2020-10-23 21:29:05.871180+00:00 looper3 2020-10-23 21:17:13+00:00 op139 success 2020-10-23 21:29:12.985216+00:00 2020-10-23 21:29:13.263131+00:00 looper3 2020-10-23 21:17:13+00:00 op140 success 2020-10-23 21:29:20.192175+00:00 2020-10-23 21:29:20.593287+00:00 looper3 2020-10-23 21:17:13+00:00 op141 success 2020-10-23 21:29:30.988325+00:00 2020-10-23 21:29:31.434159+00:00 looper3 2020-10-23 21:17:13+00:00 op142 success 2020-10-23 21:29:36.657638+00:00 2020-10-23 21:29:37.037321+00:00 looper3 2020-10-23 21:17:13+00:00 op143 success 2020-10-23 21:29:43.844682+00:00 2020-10-23 21:29:44.140033+00:00 looper3 2020-10-23 21:17:13+00:00 op144 success 2020-10-23 21:29:50.888780+00:00 2020-10-23 21:29:51.072692+00:00 looper3 2020-10-23 21:17:13+00:00 op145 success 2020-10-23 21:29:56.428214+00:00 2020-10-23 21:29:56.796678+00:00 looper3 2020-10-23 21:17:13+00:00 op146 success 2020-10-23 21:30:02.124052+00:00 2020-10-23 21:30:02.431191+00:00 looper3 2020-10-23 21:17:13+00:00 op147 success 2020-10-23 21:30:06.604598+00:00 2020-10-23 21:30:06.823305+00:00 looper3 2020-10-23 21:17:13+00:00 op148 success 2020-10-23 21:30:11.007231+00:00 2020-10-23 21:30:11.345026+00:00 looper3 2020-10-23 21:17:13+00:00 op149 success 2020-10-23 21:30:15.551605+00:00 2020-10-23 21:30:15.773243+00:00 looper3 2020-10-23 21:17:13+00:00 op150 success 2020-10-23 21:30:21.094714+00:00 2020-10-23 21:30:21.434791+00:00 looper3 2020-10-23 21:17:13+00:00 op151 success 2020-10-23 21:30:27.232358+00:00 2020-10-23 21:30:27.754859+00:00 looper3 2020-10-23 21:17:13+00:00 op152 success 2020-10-23 21:30:35.714311+00:00 2020-10-23 21:30:35.983769+00:00 looper3 2020-10-23 21:17:13+00:00 op153 success 2020-10-23 21:30:41.957830+00:00 2020-10-23 21:30:42.223755+00:00 looper3 2020-10-23 21:17:13+00:00 op154 success 2020-10-23 21:30:46.948933+00:00 2020-10-23 21:30:47.440620+00:00 looper3 2020-10-23 21:17:13+00:00 op155 success 2020-10-23 21:30:52.764943+00:00 2020-10-23 21:30:53.317498+00:00 looper3 2020-10-23 21:17:13+00:00 op156 success 2020-10-23 21:31:00.429350+00:00 2020-10-23 21:31:00.883742+00:00 looper3 2020-10-23 21:17:13+00:00 op157 success 2020-10-23 21:31:06.639038+00:00 2020-10-23 21:31:06.911108+00:00 looper3 2020-10-23 21:17:13+00:00 op158 success 2020-10-23 21:31:10.871669+00:00 2020-10-23 21:31:11.136209+00:00 looper3 2020-10-23 21:17:13+00:00 op159 success 2020-10-23 21:31:15.495684+00:00 2020-10-23 21:31:15.941744+00:00 looper3 2020-10-23 21:17:13+00:00 op160 success 2020-10-23 21:31:21.201543+00:00 2020-10-23 21:31:21.520760+00:00 looper3 2020-10-23 21:17:13+00:00 op161 success 2020-10-23 21:31:26.874064+00:00 2020-10-23 21:31:27.150572+00:00 looper3 2020-10-23 21:17:13+00:00 op162 success 2020-10-23 21:31:30.866699+00:00 2020-10-23 21:31:31.105930+00:00 looper3 2020-10-23 21:17:13+00:00 op163 success 2020-10-23 21:31:36.357858+00:00 2020-10-23 21:31:36.582599+00:00 looper3 2020-10-23 21:17:13+00:00 op164 success 2020-10-23 21:31:40.883959+00:00 2020-10-23 21:31:41.142082+00:00 looper3 2020-10-23 21:17:13+00:00 op165 success 2020-10-23 21:31:45.867105+00:00 2020-10-23 21:31:46.305160+00:00 looper3 2020-10-23 21:17:13+00:00 op166 success 2020-10-23 21:31:51.923422+00:00 2020-10-23 21:31:52.271170+00:00 looper3 2020-10-23 21:17:13+00:00 op167 success 2020-10-23 21:31:57.100056+00:00 2020-10-23 21:31:57.330318+00:00 looper3 2020-10-23 21:17:13+00:00 op168 success 2020-10-23 21:32:02.018708+00:00 2020-10-23 21:32:02.313921+00:00 looper3 2020-10-23 21:17:13+00:00 op169 success 2020-10-23 21:32:06.554984+00:00 2020-10-23 21:32:06.804953+00:00 looper3 2020-10-23 21:17:13+00:00 op170 success 2020-10-23 21:32:10.747118+00:00 2020-10-23 21:32:11.031180+00:00 looper3 2020-10-23 21:17:13+00:00 op171 success 2020-10-23 21:32:17.234422+00:00 2020-10-23 21:32:17.439711+00:00 looper3 2020-10-23 21:17:13+00:00 op172 success 2020-10-23 21:32:24.961672+00:00 2020-10-23 21:32:25.254101+00:00 looper3 2020-10-23 21:17:13+00:00 op173 success 2020-10-23 21:32:31.206376+00:00 2020-10-23 21:32:31.653372+00:00 looper3 2020-10-23 21:17:13+00:00 op174 success 2020-10-23 21:32:37.801675+00:00 2020-10-23 21:32:38.450521+00:00 looper3 2020-10-23 21:17:13+00:00 op175 success 2020-10-23 21:32:43.086747+00:00 2020-10-23 21:32:43.425551+00:00 looper3 2020-10-23 21:17:13+00:00 op176 success 2020-10-23 21:32:47.982276+00:00 2020-10-23 21:32:48.480893+00:00 looper3 2020-10-23 21:17:13+00:00 op177 success 2020-10-23 21:32:53.532761+00:00 2020-10-23 21:32:53.842269+00:00 looper3 2020-10-23 21:17:13+00:00 op178 success 2020-10-23 21:32:59.126766+00:00 2020-10-23 21:32:59.435870+00:00 looper3 2020-10-23 21:17:13+00:00 op179 success 2020-10-23 21:33:04.048557+00:00 2020-10-23 21:33:04.437552+00:00 looper3 2020-10-23 21:17:13+00:00 op180 success 2020-10-23 21:33:10.480207+00:00 2020-10-23 21:33:10.734086+00:00 looper3 2020-10-23 21:17:13+00:00 op181 success 2020-10-23 21:33:15.643803+00:00 2020-10-23 21:33:16.049284+00:00 looper3 2020-10-23 21:17:13+00:00 op182 success 2020-10-23 21:33:20.412503+00:00 2020-10-23 21:33:20.714681+00:00 looper3 2020-10-23 21:17:13+00:00 op183 success 2020-10-23 21:33:25.670301+00:00 2020-10-23 21:33:25.937619+00:00 looper3 2020-10-23 21:17:13+00:00 op184 success 2020-10-23 21:33:31.417915+00:00 2020-10-23 21:33:31.693941+00:00 looper3 2020-10-23 21:17:13+00:00 op185 success 2020-10-23 21:33:37.015528+00:00 2020-10-23 21:33:37.333272+00:00 looper3 2020-10-23 21:17:13+00:00 op186 success 2020-10-23 21:33:43.171019+00:00 2020-10-23 21:33:43.537836+00:00

dag code for looper3 in https://issues.apache.org/jira/browse/AIRFLOW-6920

tooptoop4 avatar Oct 24 '20 00:10 tooptoop4

@shlomiken is there a fix/workaround for this issue?

Some of the performance issues of the scheduler were fixed in version 10.7 - at least for our use case it was a major improvement and didn't get into the scheduler is stuck scenario.

shlomiken avatar Oct 25 '20 13:10 shlomiken

@ryw time has not changed much :(

@tooptoop4 Yeah, would love more details on how you tested it, i.e. which Executor, what resources etc

kaxil avatar Oct 26 '20 15:10 kaxil

@tooptoop4 What version were you running?

Testing on Alpha1 I see this result on the LocalExecutor:

   dag   | task_id |          start_date           |           end_date            
---------+---------+-------------------------------+-------------------------------
 looper3 | op1     | 2020-10-26 15:34:48.334339+00 | 2020-10-26 15:34:48.583849+00
 looper3 | op2     | 2020-10-26 15:34:48.249266+00 | 2020-10-26 15:34:48.549773+00
 looper3 | op3     | 2020-10-26 15:34:48.274539+00 | 2020-10-26 15:34:48.538254+00
 looper3 | op4     | 2020-10-26 15:34:48.277586+00 | 2020-10-26 15:34:48.564638+00
 looper3 | op5     | 2020-10-26 15:34:48.333745+00 | 2020-10-26 15:34:48.599082+00
 looper3 | op6     | 2020-10-26 15:34:48.253816+00 | 2020-10-26 15:34:48.572588+00
 looper3 | op7     | 2020-10-26 15:34:48.267808+00 | 2020-10-26 15:34:48.542517+00
 looper3 | op8     | 2020-10-26 15:34:48.238346+00 | 2020-10-26 15:34:48.470832+00
 looper3 | op9     | 2020-10-26 15:34:48.312878+00 | 2020-10-26 15:34:48.538258+00
 looper3 | op10    | 2020-10-26 15:34:48.323253+00 | 2020-10-26 15:34:48.563398+00
 looper3 | op11    | 2020-10-26 15:34:48.253816+00 | 2020-10-26 15:34:48.558089+00
 looper3 | op12    | 2020-10-26 15:34:48.326777+00 | 2020-10-26 15:34:48.631332+00
 looper3 | op13    | 2020-10-26 15:34:48.397406+00 | 2020-10-26 15:34:48.634616+00
 looper3 | op14    | 2020-10-26 15:34:48.373018+00 | 2020-10-26 15:34:48.646307+00
 looper3 | op15    | 2020-10-26 15:34:48.366809+00 | 2020-10-26 15:34:48.59908+00
 looper3 | op16    | 2020-10-26 15:34:48.338096+00 | 2020-10-26 15:34:48.587398+00
 looper3 | op17    | 2020-10-26 15:34:50.65509+00  | 2020-10-26 15:34:51.010709+00
...
 looper3 | op194   | 2020-10-26 15:35:17.451716+00 | 2020-10-26 15:35:17.633909+00
 looper3 | op195   | 2020-10-26 15:35:17.485059+00 | 2020-10-26 15:35:17.678551+00
 looper3 | op196   | 2020-10-26 15:35:17.476964+00 | 2020-10-26 15:35:17.665511+00
 looper3 | op197   | 2020-10-26 15:35:17.465667+00 | 2020-10-26 15:35:17.654104+00
 looper3 | op198   | 2020-10-26 15:35:17.466149+00 | 2020-10-26 15:35:17.651209+00
 looper3 | op199   | 2020-10-26 15:35:17.497657+00 | 2020-10-26 15:35:17.685343+00
 looper3 | op200   | 2020-10-26 15:35:17.492176+00 | 2020-10-26 15:35:17.675262+00

Total run time for this dag run is 31.614024s.

ashb avatar Oct 26 '20 15:10 ashb

@ashb - How are you , just wanted to remind all that the tests we made and were problematic were mostly on celery even with 1 executor or many (back then in the test it wasn't matter)

shlomiken avatar Oct 26 '20 15:10 shlomiken

@shlomiken :wave:

Testing with CeleryExecutor

   dag   | task_id |          start_date           |           end_date            
---------+---------+-------------------------------+-------------------------------
 looper3 | op1     | 2020-10-26 16:13:45.769389+00 | 2020-10-26 16:13:45.910303+00
 looper3 | op2     | 2020-10-26 16:13:46.547622+00 | 2020-10-26 16:13:46.644183+00
 looper3 | op3     | 2020-10-26 16:13:45.901544+00 | 2020-10-26 16:13:46.041624+00
 looper3 | op4     | 2020-10-26 16:13:45.925278+00 | 2020-10-26 16:13:46.060622+00
 looper3 | op5     | 2020-10-26 16:13:45.907147+00 | 2020-10-26 16:13:46.047497+00
 looper3 | op6     | 2020-10-26 16:13:51.287763+00 | 2020-10-26 16:13:51.481049+00
 looper3 | op7     | 2020-10-26 16:13:51.269082+00 | 2020-10-26 16:13:51.445426+00
 looper3 | op8     | 2020-10-26 16:13:51.348665+00 | 2020-10-26 16:13:51.525376+00
 looper3 | op9     | 2020-10-26 16:13:51.356509+00 | 2020-10-26 16:13:51.524184+00
 looper3 | op10    | 2020-10-26 16:13:51.180377+00 | 2020-10-26 16:13:51.381662+00
 looper3 | op11    | 2020-10-26 16:13:51.945769+00 | 2020-10-26 16:13:52.048007+00
 looper3 | op12    | 2020-10-26 16:13:51.307221+00 | 2020-10-26 16:13:51.498063+00
 looper3 | op13    | 2020-10-26 16:13:51.263068+00 | 2020-10-26 16:13:51.419905+00
 looper3 | op14    | 2020-10-26 16:13:51.322174+00 | 2020-10-26 16:13:51.508973+00
 looper3 | op15    | 2020-10-26 16:13:52.031608+00 | 2020-10-26 16:13:52.139532+00
 looper3 | op16    | 2020-10-26 16:13:52.012843+00 | 2020-10-26 16:13:52.115407+00
 looper3 | op17    | 2020-10-26 16:13:52.045491+00 | 2020-10-26 16:13:52.148687+00
...

Celery is basically the same.

ashb avatar Oct 26 '20 16:10 ashb

@kaxil @ashb single pool slot as I'm testing fully sequential tasks

tooptoop4 avatar Oct 26 '20 16:10 tooptoop4

@tooptoop4 With a sequential dag:

 looper3 | op1     | 2020-10-26 16:32:11.125396+00 | 2020-10-26 16:32:11.208969+00
 looper3 | op2     | 2020-10-26 16:32:12.319367+00 | 2020-10-26 16:32:12.40748+00
 looper3 | op3     | 2020-10-26 16:32:13.471998+00 | 2020-10-26 16:32:13.555303+00
 looper3 | op4     | 2020-10-26 16:32:14.631682+00 | 2020-10-26 16:32:14.714908+00
 looper3 | op5     | 2020-10-26 16:32:15.837069+00 | 2020-10-26 16:32:15.942473+00
 looper3 | op6     | 2020-10-26 16:32:17.014756+00 | 2020-10-26 16:32:17.098437+00
 looper3 | op7     | 2020-10-26 16:32:18.227378+00 | 2020-10-26 16:32:18.321089+00
 looper3 | op8     | 2020-10-26 16:32:19.384662+00 | 2020-10-26 16:32:19.46615+00
 looper3 | op9     | 2020-10-26 16:32:20.580557+00 | 2020-10-26 16:32:20.666001+00
 looper3 | op10    | 2020-10-26 16:32:21.779499+00 | 2020-10-26 16:32:21.864664+00
 looper3 | op11    | 2020-10-26 16:32:22.93437+00  | 2020-10-26 16:32:23.027736+00
 looper3 | op12    | 2020-10-26 16:32:24.083956+00 | 2020-10-26 16:32:24.16761+00
 looper3 | op13    | 2020-10-26 16:32:25.273135+00 | 2020-10-26 16:32:25.359319+00
 looper3 | op14    | 2020-10-26 16:32:26.424024+00 | 2020-10-26 16:32:26.516319+00
 looper3 | op15    | 2020-10-26 16:32:27.61346+00  | 2020-10-26 16:32:27.699136+00
 looper3 | op16    | 2020-10-26 16:32:28.808298+00 | 2020-10-26 16:32:28.893568+00
 looper3 | op17    | 2020-10-26 16:32:29.995106+00 | 2020-10-26 16:32:30.078256+00
 looper3 | op18    | 2020-10-26 16:32:31.191905+00 | 2020-10-26 16:32:31.276655+00

ashb avatar Oct 26 '20 16:10 ashb

So the task lag is about 5x better in 2.0.0alpha -- it's still slower than I'd like even if it is much much faster than 1.10

ashb avatar Oct 26 '20 16:10 ashb

what config did u change to get the gap between tasks of around 1 second?

tooptoop4 avatar Oct 26 '20 18:10 tooptoop4

No config changed - I used the new 2.0.0 alpha 2

ashb avatar Oct 26 '20 19:10 ashb

FYI we've also made it easy to run Airflow 2 alpha locally using the astro CLI.

  1. Install the CLI:

curl -sSL https://install.astronomer.io | sudo bash -s -- v0.21.0

  1. Create an Airflow project w/ astro dev init

  2. Change Dockerfile to:

FROM astronomerio/ap-airflow:2.0.0-1.dev3-buster-onbuild

  1. You can install packages in requirements.txt using this format:
apache-airflow-providers-google
apache-airflow-providers-snowflake
apache-airflow-providers-http
apache-airflow-providers-postgres

If you have some dependency conflicts you can designate what version to use with this format:

requests<2.24.0
idna<2.10

I had to do this for snowflake-connector-python package to work, for example...

  1. Then run astro dev start to fire everything up locally w/ Docker Compose

Note: you need to have Docker installed on your computer as well for this to work.

ryw avatar Oct 26 '20 20:10 ryw