astronomer-cosmos icon indicating copy to clipboard operation
astronomer-cosmos copied to clipboard

performance degradation

Open liranc1 opened this issue 9 months ago • 4 comments

Before using cosmos Airflow dag was running for about 15 min for a certain dbt command. After the change to cosmos, the same dbt command is much more volatile, often taking 20-30 min. All Airflow's resources stayed the same for the dag, and there was no change in the dbt connection details.

I also encountered some tasks randomly failing due to connection error (snowflake), that was successful on the next run. This issue did not occur without cosmos.

cosmos configurations used:

ExecutionConfig(dbt_executable_path=DBT_EXECUTABLE_PATH)

RenderConfig( select=["models"], test_behavior=TestBehavior.NONE, load_method=LoadMode.DBT_LS, dbt_deps=False )

ProjectConfig(os.environ["DBT_PROJECT_PATH"], dbt_vars=dbt_vars)

dbt version: Core:

  • installed: 1.5.6 Plugins:
  • snowflake: 1.5.6

liranc1 avatar Apr 30 '24 06:04 liranc1

To address the performance degradation and intermittent connection errors with your Airflow DAGs using cosmos for dbt tasks, consider the following targeted steps:

  1. Optimize Dependency Installation: If dependencies are installed with each run, this can slow down execution. After the initial run, set install_deps to False to reuse the environment, as shown in the cosmos_manifest_example.py.

  2. Parallel Model Execution: Ensure your Airflow configuration supports parallel task execution to fully utilize dbt's parallel model execution capabilities.

  3. Resource Allocation: Check that your Airflow and dbt tasks have sufficient memory and CPU resources.

  4. Data Volume and Complexity: For large or complex data transformations, optimize your dbt models or break down tasks into smaller segments.

  5. Network Latency: If Airflow and the database are not in the same network, reduce network latency by optimizing your network configuration.

For the Snowflake connection issues:

  • Ensure Correct Configuration: Verify that your Airflow Snowflake connection is correctly configured, especially for key-pair authentication using environment variables for sensitive information like private_key and private_key_passphrase.
  • Review Airflow and dbt Logs: Detailed error messages in these logs can provide insights into the specific issues.
  • Network and Snowflake Warehouse Performance: Assess network setup and Snowflake warehouse configuration to ensure they are optimized for your workload.
  • Update Dependencies: Ensure all related dependencies are up to date to benefit from performance improvements and bug fixes.

These steps are derived from analyzing the provided configuration and issues. Implementing these recommendations should help mitigate the performance and connectivity issues you're experiencing.

Details

Sources


About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

dosubot[bot] avatar Apr 30 '24 06:04 dosubot[bot]

Hi @liranc1 thanks for sharing this information, could you please confirm which version of Cosmos you were using before and after this performance degradation was observed?

tatiana avatar Apr 30 '24 13:04 tatiana

Cosmos was not used before, I was using bash operator. The performance degradation started once I moved to Cosmos. The cosmos version I used is 1.3.2

liranc1 avatar Apr 30 '24 14:04 liranc1

@liranc1 could you try out Cosmos 1.4 and let us know if there are any performance improvements?

tatiana avatar May 14 '24 08:05 tatiana

Some progress: #1014.

tatiana avatar Jun 05 '24 06:06 tatiana

The previously mentioned PR, #1014, is for review and seems to have promising results

tatiana avatar Jun 17 '24 06:06 tatiana

If using LoadMode.DBT_LS, please, could you try Cosmos 1.5.0a9, which will be released as a stable version this week?

  • Release notes: https://github.com/astronomer/astronomer-cosmos/releases/tag/astronomer-cosmos-v1.5.0a9
  • PyPI release: https://pypi.org/project/astronomer-cosmos/1.5.0a9/
  • Docs: https://astronomer.github.io/astronomer-cosmos/configuration/caching.html

Some ways to improve the performance using Cosmos 1.4:

1. Can you pre-compile your dbt project?

If yes, this would remove this responsibility from the Airflow DAG processor, greatly impacting the DAG parsing time. You could try this by using and specifying the path to the manifest file:

DbtDag(
    ...,
    render_config=RenderConfig(
        load_method=LoadMode.DBT_MANIFEST
    )
)

More information: https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-manifest

2. If you need to use LoadMode.DBT_LS, can you pre-install dbt dependencies in the Airflow scheduler and worker nodes?

If yes, this will avoid Cosmos having to run dbt deps all the time before running any dbt command, both in the scheduler and worker nodes. In that case, you should set:

DbtDag(
    ...,
    operator_args={"install_deps": False}
    render_config=RenderConfig(
        dbt_deps=False
    )
)

More info: https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html

3. If you need to use LoadMode.DBT_LS, is your dbt project large? Could you use selectors to select a subset?

jaffle_shop = DbtDag(
    render_config=RenderConfig(
        select=["path:analytics"],
    )
)

More info: https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html

4. Are you able to install dbt in the same Python virtual environment as you have Airflow installed?

If this is a possibility, you'll be able to experience significant performance improvements by leveraging the InvocationMode.DBT_RUNNER method, which has been switched on by default since Cosmos 1.4.

More information: https://astronomer.github.io/astronomer-cosmos/getting_started/execution-modes.html#invocation-modes

tatiana avatar Jun 26 '24 06:06 tatiana