sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Clarify the source_dir argument in the FrameworkProcessor.run method

Open wsykala opened this issue 1 year ago • 0 comments

What did you find confusing? Please describe.

I was trying to run the Processing Job in the script mode with a custom dependencies provided through requirements.txt file. To do that I packed both the script and requirements into a single example.tar.gz file, uploaded that file into S3, and then provided the full path as source_dir argument in the run method of FrameworkProcessor. Note that the documentation does not specify what the name of the file should be, it only says that the it needs to be a tar.gz file, so I assumed that the name of the file does not matter: https://github.com/aws/sagemaker-python-sdk/blob/95bbe7abc30b7ef842e90c5a7225a5784c3ce4d8/src/sagemaker/processing.py#L1519-L1523

However it turns out, that the file name must be sourcedir.tar.gz. Passing any other name means that the process running inside the container is not able to untar the file (as it does not know what to extract), thus failing the job.

Describe how documentation can be improved

Simply updating the lines from it must point to a tar.gz file to it must point to a sourcedir.tar.gz file should inform the user what file is expected.

Additional context

N/A

wsykala avatar Jul 28 '22 20:07 wsykala