airflow-guides icon indicating copy to clipboard operation
airflow-guides copied to clipboard

Update airflow-sagemaker.md

Open magdagultekin opened this issue 3 years ago • 1 comments

I got an error: An error occurred (ValidationException) when calling the CreateModel operation: Could not find model data at s3://astro-onboarding/iris/results/guide-train-iris/output/model.tar.gz. - there's additional results.csv folder added in the S3OutputPath.

After removing it, all tasks ran successfully but the last task results in test.csv.out file - I believe I should expect a CSV file, is that correct? If so, I wasn't able to find the cause.

Also, I triggered this DAG twice and it caused some issues:

  • train_model: from the logs:
[2022-07-21, 12:05:19 UTC] {sagemaker.py:665} INFO - Found existing training job with name 'train-iris'.
[2022-07-21, 12:05:19 UTC] {sagemaker.py:668} INFO - Incremented training job name to 'train-iris-2'.
[2022-07-21, 12:05:19 UTC] {sagemaker.py:647} INFO - Creating SageMaker training job train-iris-2.

But the next task is looking for train-iris, not train-iris-2. As config in SageMakerModelOperator is not templated, I created a custom operator to have model_config look as follows:

model_config = {
    "ExecutionRoleArn": role,
    "ModelName": model_name,
    "PrimaryContainer": {
        "Mode": "SingleModel",
        "Image": "404615174143.dkr.ecr.us-east-2.amazonaws.com/knn",
        "ModelDataUrl": "s3://{0}/{1}/{2}/output/model.tar.gz".format(s3_bucket, output_s3_key, '{{ ti.xcom_pull(task_ids="train_model")["Training"]["TrainingJobName"] }}'),
    },
  • create_model: An error occurred (ValidationException) when calling the CreateModel operation: Cannot create already existing model "arn:aws:sagemaker:us-east-2:043672736276:model/iris-knn".

I'm wondering if the DAG would run every day, wouldn't these issues occur as well?

I have way more questions than answers and please forgive me if they're dumb, I've just started looking at SageMaker 🙏

magdagultekin avatar Jul 22 '22 09:07 magdagultekin

Hey @magdagultekin thanks for this - honestly I think it would be best to have someone with an ML background take a look at this one. I wrote the original, but I'm far from an expert (obviously, since there are some issues you've found :) ...), and I don't know the answers to all your questions. I'll see if maybe someone from the data team could help take a look!

kentdanas avatar Jul 25 '22 16:07 kentdanas

@magdagultekin FYI, @fhoda made some updates to this example that we'll publish shortly after we migrate guides to their new home on our docs site so I'm going to close this PR

kentdanas avatar Sep 28 '22 18:09 kentdanas