sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard
Clarify the source_dir argument in the FrameworkProcessor.run method
What did you find confusing? Please describe.
I was trying to run the Processing Job in the script mode with a custom dependencies provided through requirements.txt
file. To do that I packed both the script and requirements into a single example.tar.gz
file, uploaded that file into S3, and then provided the full path as source_dir
argument in the run
method of FrameworkProcessor
.
Note that the documentation does not specify what the name of the file should be, it only says that the it needs to be a tar.gz
file, so I assumed that the name of the file does not matter:
https://github.com/aws/sagemaker-python-sdk/blob/95bbe7abc30b7ef842e90c5a7225a5784c3ce4d8/src/sagemaker/processing.py#L1519-L1523
However it turns out, that the file name must be sourcedir.tar.gz
. Passing any other name means that the process running inside the container is not able to untar the file (as it does not know what to extract), thus failing the job.
Describe how documentation can be improved
Simply updating the lines from it must point to a tar.gz file
to it must point to a sourcedir.tar.gz file
should inform the user what file is expected.
Additional context
N/A