sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Sagemaker TSX: Error with num_samples

Open Alex-Wenner-FHR opened this issue 9 months ago • 0 comments

Describe the bug

org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 830, in main
    process()
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 822, in process
    serializer.dump_stream(out_iter, outfile)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 274, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/usr/local/lib/python3.9/site-packages/analyzer/analyzers/timeseries/timeseries_asymmetric_shap_analyzer.py", line 251, in explain
    instance_explanation=explainer.explain(self._to_explainer_input(row), baseline_config=baseline_config),
  File "/usr/local/lib/python3.9/site-packages/explainers/shap/asymmetric_shap/asymmetric_shap.py", line 94, in explain
    return self._explain_time_series(input_dataset, baseline_config or TimeSeriesBaselineConfig())
  File "/usr/local/lib/python3.9/site-packages/explainers/shap/asymmetric_shap/asymmetric_shap.py", line 121, in _explain_time_series
    return self._compute_feature_attributions(input_dataset, synthetic_dataset, baseline)
  File "/usr/local/lib/python3.9/site-packages/explainers/shap/asymmetric_shap/asymmetric_shap.py", line 179, in _compute_feature_attributions
    inference_result = self._model(synthetic_dataset.dataset)
  File "/usr/local/lib/python3.9/site-packages/analyzer/predictor/predictor.py", line 63, in __call__
    return self.predict(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/analyzer/predictor/predictor.py", line 328, in predict
    return np.array(predicted_labels).reshape(
ValueError: cannot reshape array of size 7722 into shape (289,786,26)

It appears this error has a direct relationship with the parameter of num_samples in the AsymmetricShapleyValueConfig. I am operating under the impression that the num_samples with the fine_grained granularity should be the (dimension of target timeseries + dimension of related timeseries)^2. In my use case my target dimension is 1 and related timeseries is 16. Thus 17^2 would be 289. That is the value I am specifying: num_samples = 289

To reproduce I am unsure how to reproduce this if following those specifications are working for others.

Expected behavior I would expect the implementation to function properly.

System information A description of your system. Please provide:

  • SageMaker Python SDK version: 2.218.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): N/A
  • Framework version: N/A
  • Python version: 3.9
  • CPU or GPU: Both
  • Custom Docker image (Y/N): N

Alex-Wenner-FHR avatar May 03 '24 18:05 Alex-Wenner-FHR