Migrate Amazon example DAGs to new design
There is a new design of system tests that was introduced by the AIP-47.
All current example dags need to be migrated and converted into system tests, so they can be run in the CI process automatically before releases.
This is an aggregated issue for all example DAGs related to Amazon provider. It is created to track progress of their migration.
List of paths to example DAGs:
- [x] airflow/providers/amazon/aws/example_dags/example_s3_bucket_tagging.py
- [ ] airflow/providers/amazon/aws/example_dags/example_emr_eks.py
- [x] airflow/providers/amazon/aws/example_dags/example_athena.py
- [x] airflow/providers/amazon/aws/example_dags/example_sqs.py
- [ ] airflow/providers/amazon/aws/example_dags/example_eks_with_nodegroups.py
- [x] airflow/providers/amazon/aws/example_dags/example_dms_full_load_task.py
- [ ] airflow/providers/amazon/aws/example_dags/example_emr.py
- [x] airflow/providers/amazon/aws/example_dags/example_sagemaker.py
- [ ] airflow/providers/amazon/aws/example_dags/example_eks_with_nodegroup_in_one_step.py
- [ ] airflow/providers/amazon/aws/example_dags/example_dynamodb_to_s3.py
- [ ] airflow/providers/amazon/aws/example_dags/example_s3_to_sftp.py
- [x] airflow/providers/amazon/aws/example_dags/example_local_to_s3.py
- [x] airflow/providers/amazon/aws/example_dags/example_s3_bucket.py
- [x] airflow/providers/amazon/aws/example_dags/example_google_api_to_s3_transfer_basic.py
- [x] airflow/providers/amazon/aws/example_dags/example_redshift_cluster.py
- [x] airflow/providers/amazon/aws/example_dags/example_lambda.py
- [ ] airflow/providers/amazon/aws/example_dags/example_s3_to_redshift.py
- [ ] airflow/providers/amazon/aws/example_dags/example_imap_attachment_to_s3.py
- [x] airflow/providers/amazon/aws/example_dags/example_ecs_fargate.py
- [x] airflow/providers/amazon/aws/example_dags/example_rds.py
- [ ] airflow/providers/amazon/aws/example_dags/example_eks_with_fargate_in_one_step.py
- [x] airflow/providers/amazon/aws/example_dags/example_batch.py
- [x] airflow/providers/amazon/aws/example_dags/example_datasync_2.py
- [x] airflow/providers/amazon/aws/example_dags/example_dynamodb_to_s3_segmented.py
- [x] airflow/providers/amazon/aws/example_dags/example_ecs_ec2.py
- [ ] airflow/providers/amazon/aws/example_dags/example_eks_with_fargate_profile.py
- [x] airflow/providers/amazon/aws/example_dags/example_redshift_to_s3.py
- [x] airflow/providers/amazon/aws/example_dags/example_redshift_sql.py
- [x] airflow/providers/amazon/aws/example_dags/example_sns.py
- [x] airflow/providers/amazon/aws/example_dags/example_google_api_to_s3_transfer_advanced.py
- [x] airflow/providers/amazon/aws/example_dags/example_datasync_1.py
- [ ] airflow/providers/amazon/aws/example_dags/example_glacier_to_gcs.py
- [x] airflow/providers/amazon/aws/example_dags/example_redshift_data_execute_sql.py
- [ ] airflow/providers/amazon/aws/example_dags/example_eks_templated.py
- [ ] airflow/providers/amazon/aws/example_dags/example_sftp_to_s3.py
- [ ] airflow/providers/amazon/aws/example_dags/example_salesforce_to_s3.py
@potiuk - your new handy dandy script will update that checklist auto-magically on a schedule, right? You merged Athena yesterday.
It should update it already. I guess Athena example is still there simply :)
Yep. It needs to be removed ...
Huh. I'm positive I did that but it's definitely still there. I'll do a PR for you in a sec and tag you in it.
Quick update, the following are not checked off above:
-
airflow/providers/amazon/aws/example_dags/example_emr.py
- @syedahsn is working on this one right now.
-
airflow/providers/amazon/aws/example_dags/example_eks_templated.py
- All of the operators in this DAG are tested in other system tests which have already been converted, so there is no added benefit to converting this one. It was initially a demo of how to use Jinja templating that I wrote and contributed when I was learning how to use it, so I do think it still has value as an example DAG. Should we delete this file, move it somewhere else, or just leave it? I feel like if we keep it, then "Example DAGs" is exactly where it should live. It uses AWS operators, but it's specifically an AWS example (if that makes sense?) so maybe move it to a more generic location?
All of the remaining files were deemed out of scope for the current conversion project since they require external services/accounts/servers/etc (for example: requires a MongoDB account, a GCP account, hosting an SFTP server, etc) and we won't be able to commit to running/maintaining them on a regular basis. We may add them at a later date, but they won't be part of this big push.
- airflow/providers/amazon/aws/example_dags/example_s3_to_sftp.py
- airflow/providers/amazon/aws/example_dags/example_sftp_to_s3.py
- airflow/providers/amazon/aws/example_dags/example_imap_attachment_to_s3.py
- I think this one goes here because it needed an SMTP server configured but I'm not certain?
- airflow/providers/amazon/aws/example_dags/example_glacier_to_gcs.py
- airflow/providers/amazon/aws/example_dags/example_salesforce_to_s3.py
ok. So I guess we should be able to close that one soon ? Can we just migrate the example_* files to tests/* anyway so that we do not have two places where to look for them ? Even if those will not be "regular" pytests, having them in tests/system (with a comment that they are not runnable yet) would make for a clean conversion and remove duplicated "examples" link in the documentation (legacy/new). They might even have different sub-folder or naming to clearly distinguish them from "converted" tests.
If they get moved there, then they will be run by our CI and always fail and cause alerts. Maybe a subdirectory? @vincbeck what do you think of adding an excluded directory in the system tests directory??
Moving the example dags in tests directory will not make the CI run them so we are safe on that side. Though I am wondering if we want to have example dags in this directory. That would mix system tests and example dags together. I would be actually inclined to have a subdirectory (e.g. example_dags) just for clarification purposes. WDYT?
Moving the example dags in tests directory will not make the CI run them so we are safe on that side. Though I am wondering if we want to have example dags in this directory. That would mix system tests and example dags together. I would be actually inclined to have a subdirectory (e.g.
example_dags) just for clarification purposes. WDYT?
Subdir is fine.
I think the current state where we have some examples dag under system tests and some in providers is very confusing.
@ferruzzi comment is reasonable and at least for now we can't have system tests for everything (accounts etc..) but the current state where example dags are in two separated parts of the code base is very confusing.
I wonder if we can convert everything to system tests yet mark the relevant ones with missing accounts or other issues as excluded ? Kinda like what we do with quarantined tests.
+1 to the suggestion to move the remaining example DAGs to the system test directory. In the past month, I've had at least three people ask me why we got rid of the example DAGs, and they were not aware of the system tests.
I am not in favor of creating subdir as Airflow users may not understand why only certain ones are in example_dags directory. It might be less confusing if we could filter the System tests using a file name suffix, like example_xyz_test vs example_xyz, as this would provide a clear separation while still being easily understandable.
+1 too.
If I'm not mistaken, that would require changes to the system test runner which I think currently triggers on the "example" prefix. This was a holdout to when they were still example DAGs, so maybe the better solution would be to combine the example dags and the system tests into the tests tree, rename system test files something more obvious like {service}_system_test.py and use the new naming convention to trigger the CI tests.
That would require adjusting links to code snippets in the rst files. Any other side effects that I'm not thinking about?
If I'm not mistaken, that would require changes to the system test runner which I think currently triggers on the "example" prefix. This was a holdout to when they were still example DAGs, so maybe the better solution would be to combine the example dags and the system tests into the tests tree, rename system test files something more obvious like {service}_system_test.py and use the new naming convention to trigger the CI tests.
That would require adjusting links to code snippets in the rst files. Any other side effects that I'm not thinking about?
I think that. Any other side effects would be detected by failing CI (which is very comprehensive for those). This is not a "production" code, so I thinl even breaking small things there by going a bit ahead is fine. I am generally much more bold for test code than for production code (with the exception of making sure that the tests are run and actually test things) - any problems there are at most impacting developers, not users.
Closing as "done"