aws-step-functions-data-science-sdk-python icon indicating copy to clipboard operation
aws-step-functions-data-science-sdk-python copied to clipboard

timestamp mismatch when using code_location

Open AtsunoriFujita opened this issue 4 years ago • 1 comments

HI,

When code_location is used in estimator of TrainingStep(), the uploaded s3 path and sagemaker_submit_directory timestamp do not match(about 400 ms). This will cause the execution to fail.

In SageMaker training job, timestamp matches even if code_location is used.

S3 uploaded path s3://my-bucket/model/sagemaker-xgboost-2020-06-10-06-29-37-910/source/sourcedir.tar.gz

sagemaker_submit_directory "s3://my-bucket/model/sagemaker-xgboost-2020-06-10-06-29-38-323/source/sourcedir.tar.gz"

# Open Source distributed script mode
from sagemaker.session import s3_input, Session
from sagemaker.xgboost.estimator import XGBoost

boto_session = boto3.Session(region_name=region)
session = Session(boto_session=boto_session)

output_path = 's3://{}/{}'.format(bucket_name, 'model')

xgb_script_mode_estimator = XGBoost(
    entry_point='xgboost.py',
    source_dir='source',
    framework_version='0.90-2', # Note: framework_version is mandatory
    hyperparameters=hyperparams,
    role=role,
    train_instance_count=1, 
    train_instance_type='ml.m5.2xlarge',
    code_location=output_path, # ← Cause a mismatch
    output_path=output_path
)

AtsunoriFujita avatar Jun 10 '20 07:06 AtsunoriFujita

Hi @AtsunoriFujita, Sorry for the late response!

Thank you for bringing this to our attention - we will need to provide a fix to have consistent behaviour with SageMaker training job. Tagging this as a bug

ca-nguyen avatar Oct 01 '21 06:10 ca-nguyen