Enable dlt to run on managed AWS airflow MWAA
Feature description
User: "I've been at a stand still actually. It looks to me like the constraints from MWAA are going to prohibit me from using Snowflake as a destination. Writing to S3 is a viable replacement- but still having trouble here as well. The most recent version of airflow available on MWAA is 2.7.2. At the moment, airflow is showing some import conflicts. It can't see the path to locate PipelineTasksGroup in my dag, nor can it find DltResource in init.py. There is also a version conflict with s3fs which relies on aiobotocore. The constraint on aiobotocore here is 2.6.0 which is not"
Are you a dlt user?
I'd consider using dlt, but it's lacking a feature I need.
Use case
Run airflow on managed AWS with dlt. Library conflicts, seems S3 could be easier .
You can ask more information from the user here https://dlthub-community.slack.com/archives/C04DQA7JJN6/p1707496343595039
@adrianbr the only way to fix it is to try that ourselves.
- there's local runner https://docs.aws.amazon.com/mwaa/latest/userguide/working-dags-dependencies.html https://github.com/aws/aws-mwaa-local-runner/tree/v2.7.2
- we need to figure out how to run our helper on it
- possibly update our airflow CI workflow to run some dags on MWAA
I might be able to help here. I have been using dlt with MWAA successfully. I'm not writing to Snowflake, but have run into problems with the constraints multiple times.
My workaround was to add --constraints /dev/null to the uploaded requirements.txt file, which overrides the default constraints imposed by Airflow. It seems hacky, but AWS actually encourages this if you don't like the imposed constraints.
Idea came from this article
we had a similar issue on v2.2.2 (--constraints not needed in this version though) and decided to install additional packages directly in a virtual environement using the PythonVirtualenvOperator operator. First virtualenv needs to be part of your requirements.txt
https://docs.aws.amazon.com/mwaa/latest/userguide/samples-virtualenv.html