Update airflow-sagemaker.md
I got an error: An error occurred (ValidationException) when calling the CreateModel operation: Could not find model data at s3://astro-onboarding/iris/results/guide-train-iris/output/model.tar.gz. - there's additional results.csv folder added in the S3OutputPath.
After removing it, all tasks ran successfully but the last task results in test.csv.out file - I believe I should expect a CSV file, is that correct? If so, I wasn't able to find the cause.
Also, I triggered this DAG twice and it caused some issues:
-
train_model: from the logs:
[2022-07-21, 12:05:19 UTC] {sagemaker.py:665} INFO - Found existing training job with name 'train-iris'.
[2022-07-21, 12:05:19 UTC] {sagemaker.py:668} INFO - Incremented training job name to 'train-iris-2'.
[2022-07-21, 12:05:19 UTC] {sagemaker.py:647} INFO - Creating SageMaker training job train-iris-2.
But the next task is looking for train-iris, not train-iris-2. As config in SageMakerModelOperator is not templated, I created a custom operator to have model_config look as follows:
model_config = {
"ExecutionRoleArn": role,
"ModelName": model_name,
"PrimaryContainer": {
"Mode": "SingleModel",
"Image": "404615174143.dkr.ecr.us-east-2.amazonaws.com/knn",
"ModelDataUrl": "s3://{0}/{1}/{2}/output/model.tar.gz".format(s3_bucket, output_s3_key, '{{ ti.xcom_pull(task_ids="train_model")["Training"]["TrainingJobName"] }}'),
},
-
create_model:An error occurred (ValidationException) when calling the CreateModel operation: Cannot create already existing model "arn:aws:sagemaker:us-east-2:043672736276:model/iris-knn".
I'm wondering if the DAG would run every day, wouldn't these issues occur as well?
I have way more questions than answers and please forgive me if they're dumb, I've just started looking at SageMaker 🙏
Hey @magdagultekin thanks for this - honestly I think it would be best to have someone with an ML background take a look at this one. I wrote the original, but I'm far from an expert (obviously, since there are some issues you've found :) ...), and I don't know the answers to all your questions. I'll see if maybe someone from the data team could help take a look!
@magdagultekin FYI, @fhoda made some updates to this example that we'll publish shortly after we migrate guides to their new home on our docs site so I'm going to close this PR