amazon-sagemaker-examples
amazon-sagemaker-examples copied to clipboard
[Bug Report]
Link to the notebook Train an TensorFlow model with a SageMaker Training Job and track it using SageMaker Experiments Describe the bug When executing the notebook the model training (8th cell in the notebook) fails with
ParamValidationError: Parameter validation failed:
Unknown parameter in ProfilerConfig: "DisableProfiler", must be one of: S3OutputPath, ProfilingIntervalInMilliseconds, ProfilingParameters
Bugs replicated in SageMaker Studio domains in ap-southeast-1
and us-east-2
To reproduce Run the notebook step by step
Logs Error trace:
INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: tensorflow-training-2022-12-20-10-27-40-801
---------------------------------------------------------------------------
ParamValidationError Traceback (most recent call last)
<ipython-input-8-952c129da21f> in <module>
30 )
31
---> 32 est.fit()
/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/pipeline_context.py in wrapper(*args, **kwargs)
270 return _StepArguments(retrieve_caller_name(self_instance), run_func, *args, **kwargs)
271
--> 272 return run_func(*args, **kwargs)
273
274 return wrapper
/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name, experiment_config)
1128
1129 experiment_config = check_and_get_run_experiment_config(experiment_config)
-> 1130 self.latest_training_job = _TrainingJob.start_new(self, inputs, experiment_config)
1131 self.jobs.append(self.latest_training_job)
1132 if wait:
/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in start_new(cls, estimator, inputs, experiment_config)
2046 train_args = cls._get_train_args(estimator, inputs, experiment_config)
2047
-> 2048 estimator.sagemaker_session.train(**train_args)
2049
2050 return cls(estimator.sagemaker_session, estimator._current_job_name)
/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in train(self, input_mode, input_config, role, job_name, output_config, resource_config, vpc_config, hyperparameters, stop_condition, tags, metric_definitions, enable_network_isolation, image_uri, algorithm_arn, encrypt_inter_container_traffic, use_spot_instances, checkpoint_s3_uri, checkpoint_local_path, experiment_config, debugger_rule_configs, debugger_hook_config, tensorboard_output_config, enable_sagemaker_metrics, profiler_rule_configs, profiler_config, environment, retry_strategy)
625 self.sagemaker_client.create_training_job(**request)
626
--> 627 self._intercept_create_request(train_request, submit, self.train.__name__)
628
629 def _get_train_request( # noqa: C901
/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in _intercept_create_request(self, request, create, func_name)
4654 func_name (str): the name of the function needed intercepting
4655 """
-> 4656 return create(request)
4657
4658
/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in submit(request)
623 LOGGER.info("Creating training-job with name: %s", job_name)
624 LOGGER.debug("train request: %s", json.dumps(request, indent=4))
--> 625 self.sagemaker_client.create_training_job(**request)
626
627 self._intercept_create_request(train_request, submit, self.train.__name__)
/opt/conda/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
528 )
529 # The "self" in this scope is referring to the BaseClient.
--> 530 return self._make_api_call(operation_name, kwargs)
531
532 _api_call.__name__ = str(py_operation_name)
/opt/conda/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
922 endpoint_url=endpoint_url,
923 context=request_context,
--> 924 headers=additional_headers,
925 )
926 resolve_checksum_context(request_dict, operation_model, api_params)
/opt/conda/lib/python3.7/site-packages/botocore/client.py in _convert_to_request_dict(self, api_params, operation_model, endpoint_url, context, headers, set_user_agent_header)
989 )
990 request_dict = self._serializer.serialize_to_request(
--> 991 api_params, operation_model
992 )
993 if not self._client_config.inject_host_prefix:
/opt/conda/lib/python3.7/site-packages/botocore/validate.py in serialize_to_request(self, parameters, operation_model)
379 )
380 if report.has_errors():
--> 381 raise ParamValidationError(report=report.generate_report())
382 return self._serializer.serialize_to_request(
383 parameters, operation_model
ParamValidationError: Parameter validation failed:
Unknown parameter in ProfilerConfig: "DisableProfiler", must be one of: S3OutputPath, ProfilingIntervalInMilliseconds, ProfilingParameters
SageMaker Python SDK version: 2.125.0 Boto3 version: 1.26.33
output of pip list:
Package Version
------------------------------------ -----------------
absl-py 1.3.0
aiobotocore 2.4.1
aiohttp 3.8.3
aioitertools 0.11.0
aiosignal 1.3.1
alabaster 0.7.12
anaconda-client 1.7.2
anaconda-project 0.8.3
ansi2html 1.8.0
anyio 3.6.2
argh 0.26.2
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
asn1crypto 1.3.0
astroid 2.12.13
astropy 4.0
astunparse 1.6.3
async-timeout 4.0.2
asynctest 0.13.0
atomicwrites 1.3.0
attrs 22.1.0
autopep8 1.4.4
autovizwidget 0.20.0
awscli 1.27.24
Babel 2.11.0
backcall 0.1.0
backports.shutil-get-terminal-size 1.0.0
beautifulsoup4 4.8.2
bitarray 1.2.1
bkcharts 0.2
bleach 5.0.1
bokeh 1.4.0
boto 2.49.0
boto3 1.26.33
botocore 1.29.33
Bottleneck 1.3.2
brotlipy 0.7.0
cached-property 1.5.2
cachetools 5.2.0
certifi 2022.9.24
cffi 1.15.0
chardet 3.0.4
charset-normalizer 2.0.4
Click 7.0
cloudpickle 2.2.0
clyent 1.2.2
colorama 0.4.3
conda 22.9.0
conda-package-handling 1.8.1
contextlib2 0.6.0.post1
cryptography 38.0.4
cycler 0.10.0
Cython 0.29.15
cytoolz 0.10.1
dash 2.7.0
dash-core-components 2.0.0
dash-html-components 2.0.0
dash-table 5.0.0
dask 2022.2.0
decorator 4.4.1
defusedxml 0.6.0
diff-match-patch 20181111
dill 0.3.6
distributed 2022.2.0
distro 1.8.0
docker 6.0.1
docker-compose 1.29.2
dockerpty 0.4.1
docopt 0.6.2
docutils 0.16
dparse 0.6.2
entrypoints 0.3
et-xmlfile 1.0.1
fastcache 1.1.0
fastjsonschema 2.16.2
filelock 3.0.12
flake8 3.7.9
Flask 1.1.1
flatbuffers 22.12.6
frozenlist 1.3.3
fsspec 2022.11.0
future 0.18.2
gast 0.4.0
gevent 1.4.0
glob2 0.7
gmpy2 2.0.8
google-auth 2.15.0
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
greenlet 0.4.15
grpcio 1.51.1
h5py 2.10.0
hdijupyterutils 0.20.0
HeapDict 1.0.1
html5lib 1.0.1
hypothesis 5.5.4
idna 2.8
imageio 2.6.1
imagesize 1.2.0
importlib-metadata 4.13.0
intervaltree 3.0.2
ipykernel 5.1.4
ipython 7.34.0
ipython_genutils 0.2.0
ipywidgets 7.5.1
isort 4.3.21
itsdangerous 1.1.0
jdcal 1.4.1
jedi 0.18.2
jeepney 0.4.2
Jinja2 3.1.2
jmespath 1.0.1
joblib 0.14.1
json5 0.9.1
jsonschema 3.2.0
jupyter 1.0.0
jupyter_client 7.4.8
jupyter-console 6.1.0
jupyter_core 4.12.0
jupyter-dash 0.4.2
jupyter-server 1.23.3
jupyterlab 1.2.21
jupyterlab-pygments 0.2.2
jupyterlab-server 1.0.6
keras 2.11.0
keyring 21.1.0
kiwisolver 1.1.0
lazy-object-proxy 1.4.3
libarchive-c 2.8
libclang 14.0.6
lief 0.9.0
llvmlite 0.39.1
locket 0.2.0
lxml 4.9.1
Markdown 3.4.1
MarkupSafe 2.1.1
matplotlib 3.1.3
matplotlib-inline 0.1.6
mccabe 0.6.1
mistune 0.8.4
mkl-fft 1.0.15
mkl-random 1.1.0
mkl-service 2.3.0
mock 4.0.1
more-itertools 8.2.0
mpmath 1.1.0
msgpack 0.6.1
multidict 6.0.3
multipledispatch 0.6.0
multiprocess 0.70.14
nbclassic 0.4.8
nbclient 0.7.2
nbconvert 6.5.4
nbformat 5.7.0
nest-asyncio 1.5.6
networkx 2.4
nltk 3.7
nose 1.3.7
notebook 6.5.2
notebook_shim 0.2.2
numba 0.56.4
numexpr 2.7.1
numpy 1.21.6
numpydoc 0.9.2
oauthlib 3.2.2
olefile 0.46
openpyxl 3.0.3
opt-einsum 3.3.0
packaging 20.1
pandas 1.3.5
pandocfilters 1.4.2
parso 0.8.3
partd 1.1.0
path 13.1.0
pathlib2 2.3.5
pathos 0.3.0
pathtools 0.1.2
patsy 0.5.1
pep8 1.7.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.3.0
pip 22.3.1
pkginfo 1.5.0.1
platformdirs 2.6.0
plotly 5.8.2
pluggy 0.13.1
ply 3.11
pox 0.3.2
ppft 1.7.6.6
prometheus-client 0.7.1
prompt-toolkit 3.0.3
protobuf 3.19.6
protobuf3-to-dict 0.1.5
psutil 5.6.7
ptyprocess 0.6.0
pure-sasl 0.6.2
py 1.11.0
pyarrow 10.0.1
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycodestyle 2.5.0
pycosat 0.6.3
pycparser 2.19
pycrypto 2.6.1
pycurl 7.43.0.5
pydocstyle 4.0.1
pyflakes 2.1.1
pyfunctional 1.4.3
Pygments 2.13.0
PyHive 0.6.5
pykerberos 1.2.1
pylint 2.15.8
pyodbc 4.0.0-unsupported
pyOpenSSL 22.1.0
pyparsing 2.4.6
pyrsistent 0.15.7
PySocks 1.7.1
pytest 5.3.5
pytest-arraydiff 0.3
pytest-astropy 0.8.0
pytest-astropy-header 0.1.2
pytest-doctestplus 0.5.0
pytest-openfiles 0.4.0
pytest-remotedata 0.3.2
python-dateutil 2.8.2
python-dotenv 0.21.0
python-jsonrpc-server 0.3.4
python-language-server 0.31.7
pytz 2019.3
PyWavelets 1.1.1
pyxdg 0.26
PyYAML 6.0
pyzmq 24.0.1
QDarkStyle 2.8
QtAwesome 0.6.1
qtconsole 4.6.0
QtPy 1.9.0
regex 2022.10.31
requests 2.28.1
requests-kerberos 0.12.0
requests-oauthlib 1.3.1
retrying 1.3.4
rope 0.16.0
rsa 4.9
Rtree 0.9.3
ruamel_yaml 0.15.87
s3fs 0.4.2
s3transfer 0.6.0
sagemaker 2.125.0
sagemaker-data-insights 0.3.3
sagemaker-datawrangler 0.3.8
sagemaker-scikit-learn-extension 2.5.0
sagemaker-studio-analytics-extension 0.0.14
sagemaker-studio-sparkmagic-lib 0.1.4
sasl 0.2.1
schema 0.7.5
scikit-image 0.16.2
scikit-learn 0.22.1
scipy 1.4.1
seaborn 0.10.0
SecretStorage 3.1.2
Send2Trash 1.8.0
setuptools 59.3.0
simplegeneric 0.8.1
singledispatch 3.4.0.3
six 1.14.0
smclarify 0.3
smdebug-rulesconfig 1.0.1
sniffio 1.3.0
snowballstemmer 2.0.0
sortedcollections 1.1.2
sortedcontainers 2.1.0
soupsieve 1.9.5
sparkmagic 0.20.0
Sphinx 2.4.0
sphinxcontrib-applehelp 1.0.1
sphinxcontrib-devhelp 1.0.1
sphinxcontrib-htmlhelp 1.0.2
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.2
sphinxcontrib-serializinghtml 1.1.3
sphinxcontrib-websupport 1.2.0
spyder 4.0.1
spyder-kernels 1.8.1
SQLAlchemy 1.3.13
statsmodels 0.11.0
sympy 1.5.1
tables 3.6.1
tabulate 0.9.0
tblib 1.6.0
tenacity 8.1.0
tensorboard 2.11.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow 2.11.0
tensorflow-estimator 2.11.0
tensorflow-io-gcs-filesystem 0.29.0
termcolor 2.1.1
terminado 0.8.3
testpath 0.4.4
texttable 1.6.7
thrift 0.13.0
thrift-sasl 0.4.3
tinycss2 1.2.1
toml 0.10.2
tomli 2.0.1
tomlkit 0.11.6
toolz 0.10.0
tornado 6.2
tqdm 4.42.1
traitlets 5.6.0
typed-ast 1.5.4
typing_extensions 4.4.0
ujson 5.6.0
unicodecsv 0.14.1
urllib3 1.26.13
watchdog 0.10.2
wcwidth 0.1.8
webencodings 0.5.1
websocket-client 0.59.0
Werkzeug 2.2.2
wheel 0.34.2
widgetsnbextension 3.5.1
wrapt 1.11.2
wurlitzer 2.0.0
xlrd 1.2.0
XlsxWriter 1.2.7
xlwt 1.3.0
yapf 0.28.0
yarl 1.8.2
zict 1.0.0
zipp 3.11.0
I'm seeing the same thing in us-east-2 with the SKLearn, TensorFlow, and XGBoost estimators as well
Is this happening only in studio or for other jobs? This commit: https://github.com/aws/sagemaker-python-sdk/commit/019d5a4b232cd4d287dff35c6a8ba9681ed4c0ca added disable_profiler flag and botocore v1.29.33 seems to have this flag available as well
@acere Can you recreate new user and try again?
I got the same error message. Downgrade sagemaker to version 2.123.0 with the following command solved my problem:
pip install sagemaker==2.123.0
@acere are you still experiencing this issue? Running that notebook on Studio (Python 3 (Data Science), us-east-2) with sagemaker 2.128.0 right now, I am able to run all cells with no issue.
@claytonparnell the problem is still there on older (created before Dec 2022) SM Studio users. There isn't any issue with Studio users created after Dec 22 with any version of PySDK > 2.123.0
Ok, so solution would be to create a new sagemaker studio user now (after Dec 2022)?
This is solved by using sagemaker==2.123.0
, are there plans to fix this in newer versions?
PyTorch 1.13 and py39 are not available in 2.123. Is there an ETA for getting this fixed?
Creating a new user in the domain and then using sagemaker==2.143.0
worked for me.
I tried the same notebook on the same instance and did not have the issue.
I believe the issue is fixed on the latest Data Science image. Please try to shut down the kernel (from the top menu -> open Kernel -> Shut Down) and try again.
"Missing required parameter in ProfilerConfig: "S3OutputPath" Unknown parameter in ProfilerConfig: "DisableProfiler", must be one of: S3OutputPath, ProfilingIntervalInMilliseconds, ProfilingParameters"
I have this issue tried all of the suggestions above but none fix the issue!