airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Migrate Amazon example DAGs to new design

Open potiuk opened this issue 4 years ago • 4 comments

There is a new design of system tests that was introduced by the AIP-47.

All current example dags need to be migrated and converted into system tests, so they can be run in the CI process automatically before releases.

This is an aggregated issue for all example DAGs related to Amazon provider. It is created to track progress of their migration.

List of paths to example DAGs:

  • [x] airflow/providers/amazon/aws/example_dags/example_s3_bucket_tagging.py
  • [ ] airflow/providers/amazon/aws/example_dags/example_emr_eks.py
  • [x] airflow/providers/amazon/aws/example_dags/example_athena.py
  • [x] airflow/providers/amazon/aws/example_dags/example_sqs.py
  • [ ] airflow/providers/amazon/aws/example_dags/example_eks_with_nodegroups.py
  • [x] airflow/providers/amazon/aws/example_dags/example_dms_full_load_task.py
  • [ ] airflow/providers/amazon/aws/example_dags/example_emr.py
  • [x] airflow/providers/amazon/aws/example_dags/example_sagemaker.py
  • [ ] airflow/providers/amazon/aws/example_dags/example_eks_with_nodegroup_in_one_step.py
  • [ ] airflow/providers/amazon/aws/example_dags/example_dynamodb_to_s3.py
  • [ ] airflow/providers/amazon/aws/example_dags/example_s3_to_sftp.py
  • [x] airflow/providers/amazon/aws/example_dags/example_local_to_s3.py
  • [x] airflow/providers/amazon/aws/example_dags/example_s3_bucket.py
  • [x] airflow/providers/amazon/aws/example_dags/example_google_api_to_s3_transfer_basic.py
  • [x] airflow/providers/amazon/aws/example_dags/example_redshift_cluster.py
  • [x] airflow/providers/amazon/aws/example_dags/example_lambda.py
  • [ ] airflow/providers/amazon/aws/example_dags/example_s3_to_redshift.py
  • [ ] airflow/providers/amazon/aws/example_dags/example_imap_attachment_to_s3.py
  • [x] airflow/providers/amazon/aws/example_dags/example_ecs_fargate.py
  • [x] airflow/providers/amazon/aws/example_dags/example_rds.py
  • [ ] airflow/providers/amazon/aws/example_dags/example_eks_with_fargate_in_one_step.py
  • [x] airflow/providers/amazon/aws/example_dags/example_batch.py
  • [x] airflow/providers/amazon/aws/example_dags/example_datasync_2.py
  • [x] airflow/providers/amazon/aws/example_dags/example_dynamodb_to_s3_segmented.py
  • [x] airflow/providers/amazon/aws/example_dags/example_ecs_ec2.py
  • [ ] airflow/providers/amazon/aws/example_dags/example_eks_with_fargate_profile.py
  • [x] airflow/providers/amazon/aws/example_dags/example_redshift_to_s3.py
  • [x] airflow/providers/amazon/aws/example_dags/example_redshift_sql.py
  • [x] airflow/providers/amazon/aws/example_dags/example_sns.py
  • [x] airflow/providers/amazon/aws/example_dags/example_google_api_to_s3_transfer_advanced.py
  • [x] airflow/providers/amazon/aws/example_dags/example_datasync_1.py
  • [ ] airflow/providers/amazon/aws/example_dags/example_glacier_to_gcs.py
  • [x] airflow/providers/amazon/aws/example_dags/example_redshift_data_execute_sql.py
  • [ ] airflow/providers/amazon/aws/example_dags/example_eks_templated.py
  • [ ] airflow/providers/amazon/aws/example_dags/example_sftp_to_s3.py
  • [ ] airflow/providers/amazon/aws/example_dags/example_salesforce_to_s3.py

potiuk avatar Mar 22 '22 14:03 potiuk

@potiuk - your new handy dandy script will update that checklist auto-magically on a schedule, right? You merged Athena yesterday.

ferruzzi avatar Jun 03 '22 18:06 ferruzzi

It should update it already. I guess Athena example is still there simply :)

potiuk avatar Jun 03 '22 18:06 potiuk

Yep. It needs to be removed ...

potiuk avatar Jun 03 '22 18:06 potiuk

Huh. I'm positive I did that but it's definitely still there. I'll do a PR for you in a sec and tag you in it.

ferruzzi avatar Jun 03 '22 19:06 ferruzzi

Quick update, the following are not checked off above:

  • airflow/providers/amazon/aws/example_dags/example_emr.py

    • @syedahsn is working on this one right now.
  • airflow/providers/amazon/aws/example_dags/example_eks_templated.py

    • All of the operators in this DAG are tested in other system tests which have already been converted, so there is no added benefit to converting this one. It was initially a demo of how to use Jinja templating that I wrote and contributed when I was learning how to use it, so I do think it still has value as an example DAG. Should we delete this file, move it somewhere else, or just leave it? I feel like if we keep it, then "Example DAGs" is exactly where it should live. It uses AWS operators, but it's specifically an AWS example (if that makes sense?) so maybe move it to a more generic location?

All of the remaining files were deemed out of scope for the current conversion project since they require external services/accounts/servers/etc (for example: requires a MongoDB account, a GCP account, hosting an SFTP server, etc) and we won't be able to commit to running/maintaining them on a regular basis. We may add them at a later date, but they won't be part of this big push.

  • airflow/providers/amazon/aws/example_dags/example_s3_to_sftp.py
  • airflow/providers/amazon/aws/example_dags/example_sftp_to_s3.py
  • airflow/providers/amazon/aws/example_dags/example_imap_attachment_to_s3.py
    • I think this one goes here because it needed an SMTP server configured but I'm not certain?
  • airflow/providers/amazon/aws/example_dags/example_glacier_to_gcs.py
  • airflow/providers/amazon/aws/example_dags/example_salesforce_to_s3.py

ferruzzi avatar Oct 17 '22 22:10 ferruzzi

ok. So I guess we should be able to close that one soon ? Can we just migrate the example_* files to tests/* anyway so that we do not have two places where to look for them ? Even if those will not be "regular" pytests, having them in tests/system (with a comment that they are not runnable yet) would make for a clean conversion and remove duplicated "examples" link in the documentation (legacy/new). They might even have different sub-folder or naming to clearly distinguish them from "converted" tests.

potiuk avatar Oct 24 '22 18:10 potiuk

If they get moved there, then they will be run by our CI and always fail and cause alerts. Maybe a subdirectory? @vincbeck what do you think of adding an excluded directory in the system tests directory??

ferruzzi avatar Oct 25 '22 00:10 ferruzzi

Moving the example dags in tests directory will not make the CI run them so we are safe on that side. Though I am wondering if we want to have example dags in this directory. That would mix system tests and example dags together. I would be actually inclined to have a subdirectory (e.g. example_dags) just for clarification purposes. WDYT?

vincbeck avatar Oct 25 '22 13:10 vincbeck

Moving the example dags in tests directory will not make the CI run them so we are safe on that side. Though I am wondering if we want to have example dags in this directory. That would mix system tests and example dags together. I would be actually inclined to have a subdirectory (e.g. example_dags) just for clarification purposes. WDYT?

Subdir is fine.

potiuk avatar Oct 26 '22 00:10 potiuk

I think the current state where we have some examples dag under system tests and some in providers is very confusing.

@ferruzzi comment is reasonable and at least for now we can't have system tests for everything (accounts etc..) but the current state where example dags are in two separated parts of the code base is very confusing.

I wonder if we can convert everything to system tests yet mark the relevant ones with missing accounts or other issues as excluded ? Kinda like what we do with quarantined tests.

eladkal avatar Jan 07 '23 15:01 eladkal

+1 to the suggestion to move the remaining example DAGs to the system test directory. In the past month, I've had at least three people ask me why we got rid of the example DAGs, and they were not aware of the system tests.

I am not in favor of creating subdir as Airflow users may not understand why only certain ones are in example_dags directory. It might be less confusing if we could filter the System tests using a file name suffix, like example_xyz_test vs example_xyz, as this would provide a clear separation while still being easily understandable.

shubham22 avatar Jan 09 '23 23:01 shubham22

+1 too.

potiuk avatar Jan 10 '23 11:01 potiuk

If I'm not mistaken, that would require changes to the system test runner which I think currently triggers on the "example" prefix. This was a holdout to when they were still example DAGs, so maybe the better solution would be to combine the example dags and the system tests into the tests tree, rename system test files something more obvious like {service}_system_test.py and use the new naming convention to trigger the CI tests.

That would require adjusting links to code snippets in the rst files. Any other side effects that I'm not thinking about?

ferruzzi avatar Jan 10 '23 20:01 ferruzzi

If I'm not mistaken, that would require changes to the system test runner which I think currently triggers on the "example" prefix. This was a holdout to when they were still example DAGs, so maybe the better solution would be to combine the example dags and the system tests into the tests tree, rename system test files something more obvious like {service}_system_test.py and use the new naming convention to trigger the CI tests.

That would require adjusting links to code snippets in the rst files. Any other side effects that I'm not thinking about?

I think that. Any other side effects would be detected by failing CI (which is very comprehensive for those). This is not a "production" code, so I thinl even breaking small things there by going a bit ahead is fine. I am generally much more bold for test code than for production code (with the exception of making sure that the tests are run and actually test things) - any problems there are at most impacting developers, not users.

potiuk avatar Jan 19 '23 22:01 potiuk

Closing as "done"

potiuk avatar Jun 22 '23 13:06 potiuk