André comments

Results 19 comments of


                                            André

SagemakerModel.transform() doesn't use model's sagemakerClient

Hi @harthur, Thanks for using Amazon SageMaker! There are two SageMaker clients: the `AmazonSageMaker` client which is used to create and manage Training Jobs, Endpoints and such, and the `AmazonSageMakerRuntime`...

SagemakerModel.transform() doesn't use model's sagemakerClient

Hey @harthur, I'm not sure exactly when the client is run. It's possible we should make that a `lazy val` or otherwise delay instantiation. Do you have a stack trace...

SagemakerModel.transform() doesn't use model's sagemakerClient

Hi @harthur , Thanks for the stacktrace! Just FYI: I haven't gotten a chance to reproduce this yet, but this definitely seems like a bug. I suppose that workers are...

SagemakerModel.transform() doesn't use model's sagemakerClient

Hey @harthur , Ah, interesting, thanks for the update! Glad to hear you got it working, but you're right, we should let users build their own client. I've put a...

XGBoostSageMakerEstimator.fit() returns libsvm exception when reading csv file.

@haowang-ms89 All SageMakerEstimators rely on Spark's DataFrame writers. The XGBoostSageMakerEstimator defaults to write data using "libsvm" format. Can you try passing in "csv" to "trainingSparkDataFormat" (or "com.databricks.spark.csv" if you're using...

XGBoostSageMakerEstimator.fit() returns libsvm exception when reading csv file.

@haowang-ms89 Sure! I just commented on that issue. You will also have to pass in Some("csv") for the `trainingContentType`, or XGBoost will think you're trying to give it LibSVM data....

XGBoostSageMakerEstimator.fit() returns libsvm exception when reading csv file.

`transform()` is trying to convert your DataFrame to LibSVM for inference because the `requestRowSerializer` is set to be `LibSVMRequestRowSerializer`: https://github.com/aws/sagemaker-spark/blob/81ac05625e86db577124d7c49d4cea7ec25d181f/sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/algorithms/XGBoostSageMakerEstimator.scala#L479-L480 If you want to send CSV, you should use this...

XGBoostSageMakerEstimator.fit() returns libsvm exception when reading csv file.

@haowang-ms89 For PySpark, it's here: https://github.com/aws/sagemaker-spark/blob/81ac05625e86db577124d7c49d4cea7ec25d181f/sagemaker-pyspark-sdk/src/sagemaker_pyspark/transformation/serializers/serializers.py#L31-L40

XGBoostSageMakerEstimator.fit() returns libsvm exception when reading csv file.

@haowang-ms89 That's normal. The VectorAssembler sparsely encodes vectors if there are lots of zeros in the data to save memory. The rows with 27 are SparseVectors. The 27 is the...

XGBoostSageMakerEstimator.fit() returns libsvm exception when reading csv file.

That looks like it's still using the LibSVM serializer, not the UnlabeledCSVRequestRowSerializer. The LibSVM serializer validates the schema like this: https://github.com/aws/sagemaker-spark/blob/81ac05625e86db577124d7c49d4cea7ec25d181f/sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/transformation/serializers/SchemaValidators.scala#L28-L30 Did you set `xgboost_model.requestRowSerializer = UnlabeledCSVRequestRowSerializer()` before transforming?