sagemaker-spark icon indicating copy to clipboard operation
sagemaker-spark copied to clipboard

Newer versions of the library on maven central

Open jayantshekhar opened this issue 6 years ago • 5 comments

System Information

  • spark_2.2.0-1.2.5
  • Spark 2.3 and later

Describe the problem

EMR clusters which use spark 2.3 and later have newer versions of sagemaker spark jars.

However they are not available on maven central : https://mvnrepository.com/artifact/com.amazonaws/sagemaker-spark

When is the plan to release to maven central for spark 2.3 and later? Or any recommendations for running on later EMR versions of the cluster.

Minimal repo / logs

jayantshekhar avatar Oct 21 '19 07:10 jayantshekhar

Unfortunately, we don't have any plans to upgrade the current Spark version, but we are always re-evaluating our roadmap based on customer feedback!

laurenyu avatar Oct 24 '19 23:10 laurenyu

Thanks for that Lauren!

Trying to understand it. Is Spark-SageMaker on the roadmap and would you recommend users to continue building solutions on it?

Is there something else you would like us to go with when integrating with SageMaker especially when running jobs on EMR?

jayantshekhar avatar Nov 06 '19 04:11 jayantshekhar

Please refer to our documentation for Spark support. https://docs.aws.amazon.com/sagemaker/latest/dg/apache-spark.html We are evaluating our roadmap and will add support for the latest version in the future.

nadiaya avatar Nov 11 '19 23:11 nadiaya

Thanks a lot Nadia! Will keep an eye on it and look forward to support for Spark 2.3 and Spark 2.4.

jayantshekhar avatar Dec 02 '19 08:12 jayantshekhar

I had issues running sagemaker_pyspark on EMR 5.22 per this closed issue. I was able to have it work with no issue and confirm this with an AWS tech support. The changes I had to apply are listed in my comments in the closed issue linked above. Figured I'd also post here in case it can benefit anyone else.

One question though. It appears that sagemaker_pyspark SDK is not updated as often as sagemaker python SDK. Should we not be concerned because sagemaker_pyspark is a wrapper for sagemaker python SDK; or is it indeed lower priority in your roadmap and therefore receives less support?

ehameyie avatar Sep 25 '20 23:09 ehameyie