Error message for Class PysparkProcessor--> get_run_args, when submit_py_files is not list is misleading
Describe the bug When using a PysparkProcessor's class and methods, a deceptive error message is thrown. The class method PysparkProcessor--> get_run_args gives the following error when pipeline.upsert(role_arn= role) is called when one of the arguments, submit_py_files, is supplied and is not a Python list. The fault was initially thought to be a Sagemaker pipeline error, but we later learned that it was a PySpark processing step problem.
Error throw when called pipeline.upsert(role_arn= role). Consumed lot of our time
"“PermissionError: [Errno 13] Permission denied: ‘/opt/.sagemakerinternal/conda/pkgs/brotlipy-0.7.0-py37h27cfd23_10039ghk6y6y’“.
To reproduce
- Create a Pyspark processor
- Pass in the the optional input submit_py_files as a string not a python list.
- Create the Sagemaker pipeline with one step called preprocess.
- Call pipeline.upsert(role_arn= role) and get this error.
Expected behavior The PysparkProcessor class should have an isinstance() check for all the inputs to ensure the right datatypes. Even the doc string in the code is good.
Screenshots or logs "“PermissionError: [Errno 13] Permission denied: ‘/opt/.sagemakerinternal/conda/pkgs/brotlipy-0.7.0-py37h27cfd23_10039ghk6y6y’“. When creating a Sagemaker pipeline, especially upserting the role, pipeline.upsert(role_arn= role)."
System information A description of your system. Please provide:
- SageMaker Python SDK version: 2.145.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): PysparkProcessor
- Framework version: 2.4
- Python version: 3.9
- CPU or GPU: CPU
- Custom Docker image (Y/N):N
Additional context Add any other context about the problem here.