sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard
Sagemaker TSX: Error with num_samples
Describe the bug
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 830, in main
process()
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 822, in process
serializer.dump_stream(out_iter, outfile)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 274, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/usr/local/lib/python3.9/site-packages/analyzer/analyzers/timeseries/timeseries_asymmetric_shap_analyzer.py", line 251, in explain
instance_explanation=explainer.explain(self._to_explainer_input(row), baseline_config=baseline_config),
File "/usr/local/lib/python3.9/site-packages/explainers/shap/asymmetric_shap/asymmetric_shap.py", line 94, in explain
return self._explain_time_series(input_dataset, baseline_config or TimeSeriesBaselineConfig())
File "/usr/local/lib/python3.9/site-packages/explainers/shap/asymmetric_shap/asymmetric_shap.py", line 121, in _explain_time_series
return self._compute_feature_attributions(input_dataset, synthetic_dataset, baseline)
File "/usr/local/lib/python3.9/site-packages/explainers/shap/asymmetric_shap/asymmetric_shap.py", line 179, in _compute_feature_attributions
inference_result = self._model(synthetic_dataset.dataset)
File "/usr/local/lib/python3.9/site-packages/analyzer/predictor/predictor.py", line 63, in __call__
return self.predict(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/analyzer/predictor/predictor.py", line 328, in predict
return np.array(predicted_labels).reshape(
ValueError: cannot reshape array of size 7722 into shape (289,786,26)
It appears this error has a direct relationship with the parameter of num_samples
in the AsymmetricShapleyValueConfig
.
I am operating under the impression that the num_samples
with the fine_grained
granularity should be the (dimension of target timeseries + dimension of related timeseries)^2
. In my use case my target dimension is 1 and related timeseries is 16. Thus 17^2 would be 289. That is the value I am specifying: num_samples = 289
To reproduce I am unsure how to reproduce this if following those specifications are working for others.
Expected behavior I would expect the implementation to function properly.
System information A description of your system. Please provide:
-
SageMaker Python SDK version:
2.218.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): N/A
- Framework version: N/A
- Python version: 3.9
- CPU or GPU: Both
- Custom Docker image (Y/N): N