sagemaker-spark icon indicating copy to clipboard operation
sagemaker-spark copied to clipboard

protobuf.SageMakerProtobufFileFormat could not be instantiated

Open Karrthik-Arya opened this issue 2 years ago • 1 comments

Please fill out the form below.

System Information

  • PySpark version: 3.3.0

Describe the problem

I have been trying to use sagemaker along with spark to train on a dataset. I have been trying to run the example notebooks in a conda_python3 environment and while loading the dataset I keep facing the following problem. Could this be sorted?

Minimal repo / logs

Py4JJavaError: An error occurred while calling o110.load.
: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider com.amazonaws.services.sagemaker.sparksdk.protobuf.SageMakerProtobufFileFormat could not be instantiated
	at java.util.ServiceLoader.fail(ServiceLoader.java:232)
	at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
	at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
	at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
	at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
	at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:255)
	at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:249)
	at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
	at scala.collection.TraversableLike.filter(TraversableLike.scala:347)
	at scala.collection.TraversableLike.filter$(TraversableLike.scala:347)
	at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:644)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:728)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:230)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:214)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/execution/datasources/FileFormat$class
	at com.amazonaws.services.sagemaker.sparksdk.protobuf.SageMakerProtobufFileFormat.<init>(SageMakerProtobufFileFormat.scala:41)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at java.lang.Class.newInstance(Class.java:442)
	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
	... 30 more

Karrthik-Arya avatar Jul 08 '22 11:07 Karrthik-Arya

Just commented on another thread with a fix:

https://github.com/aws/sagemaker-spark/issues/84#issuecomment-1186432966

saltyJeff avatar Jul 17 '22 08:07 saltyJeff