Swift Glue MVP
Implement the following for the Swift SDK.
Service actions
Service actions can either be pulled out as individual functions or can be incorporated into the scenario, but each service action must be included as an excerpt in the SOS output.
- [ ] GetCrawler
- [ ] CreateCrawler
- [ ] StartCrawler
- [ ] GetDatabase
- [ ] GetTables
- [ ] CreateJob
- [ ] StartJobRun
- [ ] ListJobs
- [ ] GetJobRuns
- [ ] GetJobRun
- [ ] DeleteJob
- [ ] DeleteTable
- [ ] DeleteDatabase
- [ ] DeleteCrawler
Scenario
A scenario runs at a command prompt and prints output to the user on the result of each service action. A scenario can run in one of two ways: straight through, printing out progress as it goes, or as an interactive question/answer script.
This scenario follows the console steps outlined in these two topics:
- https://docs.aws.amazon.com/glue/latest/ug/tutorial-add-crawler.html
- https://docs.aws.amazon.com/glue/latest/ug/tutorial-create-job.html
Scaffold resources
These resources are scaffolding for the scenario. Create them with the setup.yaml CloudFormation script or with the CDK in resources/cdk/glue_role_bucket. Running the script gives the role name and bucket name as outputs, which you will need to use in your scenario.
- An S3 bucket.
- IAM role that trusts glue.amazonaws.com and grants read-write to the S3 bucket and attaches the managed AWSGlueServiceRole policy.
You will also need to upload the Python ETL script to the S3 bucket, either manually or through SDK calls:
- Upload the Python ETL script to the S3 bucket. The script is here: python/example_code/glue/flight_etl_job_script.py. This script was generated by the console tutorial, then updated by me to accept custom arguments used in the demo. If you use it as is, it should work for you.
Getting started with crawlers and jobs
- Create a crawler, pass it the IAM role and the URL to the public S3 bucket that contains the source data: s3://crawler-public-us-east-1/flight/2016/csv.
- Start the crawler. This takes a few minutes. Loop and call GetCrawler until it returns state 'READY'.
- Get the database created by the crawler and the tables in the database. Display these to the user.
- Create a job, pass it the IAM role and the URL to the Python ETL script you uploaded to the user's S3 bucket, something like: s3://doc-example-bucket-123456/flight_etl_job_script.py.
- Start a job run, pass it these custom Arguments. These are expected by the ETL script, so must match exactly:
- --input_database: [name of the database created by the crawler]
- --input_table: [name of the table created by the crawler]
- --output_bucket_url: [URL to the scaffold bucket you created for the user]
- Loop and get the job run until it returns state 'SUCCEEDED', 'STOPPED', 'FAILED', or 'TIMEOUT'.
- Output data is stored in a group of files in the user's bucket. Either direct them to look at it or download a file for them and display some of the results.
- List jobs for the user's account.
- Get job run detail for a job run.
- Delete the demo job.
- Delete the database and tables.
- Delete the crawler.
Marked stale by the Shirriff. Notifying @awsdocs/aws-sdk-docs-code-maintainers
I'm finally going to take this on again after having to stop on it for the GA of the SDK.