dlt icon indicating copy to clipboard operation
dlt copied to clipboard

Enable dlt to run on managed AWS airflow MWAA

Open adrianbr opened this issue 1 year ago • 3 comments

Feature description

User: "I've been at a stand still actually. It looks to me like the constraints from MWAA are going to prohibit me from using Snowflake as a destination. Writing to S3 is a viable replacement- but still having trouble here as well. The most recent version of airflow available on MWAA is 2.7.2. At the moment, airflow is showing some import conflicts. It can't see the path to locate PipelineTasksGroup in my dag, nor can it find DltResource in init.py. There is also a version conflict with s3fs which relies on aiobotocore. The constraint on aiobotocore here is 2.6.0 which is not"

Are you a dlt user?

I'd consider using dlt, but it's lacking a feature I need.

Use case

Run airflow on managed AWS with dlt. Library conflicts, seems S3 could be easier .

You can ask more information from the user here https://dlthub-community.slack.com/archives/C04DQA7JJN6/p1707496343595039

adrianbr avatar Feb 23 '24 15:02 adrianbr

@adrianbr the only way to fix it is to try that ourselves.

  1. there's local runner https://docs.aws.amazon.com/mwaa/latest/userguide/working-dags-dependencies.html https://github.com/aws/aws-mwaa-local-runner/tree/v2.7.2
  2. we need to figure out how to run our helper on it
  3. possibly update our airflow CI workflow to run some dags on MWAA

rudolfix avatar Feb 24 '24 11:02 rudolfix

I might be able to help here. I have been using dlt with MWAA successfully. I'm not writing to Snowflake, but have run into problems with the constraints multiple times.

My workaround was to add --constraints /dev/null to the uploaded requirements.txt file, which overrides the default constraints imposed by Airflow. It seems hacky, but AWS actually encourages this if you don't like the imposed constraints.

Idea came from this article

rubenhelsloot avatar Aug 23 '24 12:08 rubenhelsloot

we had a similar issue on v2.2.2 (--constraints not needed in this version though) and decided to install additional packages directly in a virtual environement using the PythonVirtualenvOperator operator. First virtualenv needs to be part of your requirements.txt

https://docs.aws.amazon.com/mwaa/latest/userguide/samples-virtualenv.html

jabjakub avatar Aug 26 '24 11:08 jabjakub