spark-solr icon indicating copy to clipboard operation
spark-solr copied to clipboard

spark-solr java.lang.ClassNotFoundException: solr.DefaultSource

Open rashid-1989 opened this issue 6 years ago • 10 comments

Hi @kiranchitturi

I am using spark-solr 3.0.4, apache spark 2.0.2 with solr 7.3.0 and getting the above exception. The same exception persists with different other versions of spark-solr connector. Tried making several attempts by changing the spark and solr versions as per the recommendations here [https://github.com/lucidworks/spark-solr] but nothing seems working.

Below is the POM:

	<dependency>
		<groupId>com.lucidworks.spark</groupId>
		<artifactId>spark-solr</artifactId>
		<version>3.0.4</version>
	</dependency>
		
	<dependency>
		<groupId>com.sun</groupId>
		<artifactId>tools</artifactId>
		<version>1.8.0</version>
		<scope>system</scope>
		<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
	</dependency>

	<dependency>
		<groupId>jdk.tools</groupId>
		<artifactId>jdk.tools</artifactId>
		<version>1.8.0</version>
		<scope>system</scope>
		<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
	</dependency>

Exception occured:

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: solr. Please find packages at https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148) at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:79) at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:79) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122) at com.ezest.spark.solr.sparkSolrConnTest.searchFromSolrToSpark(sparkSolrConnTest.java:37) at com.ezest.spark.solr.sparkSolrConnTest.main(sparkSolrConnTest.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: solr.DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132) at scala.util.Try.orElse(Try.scala:84) at

org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:132)

rashid-1989 avatar Sep 27 '18 06:09 rashid-1989

Update: Please note that I am able to run it locally with eclipse and solr installed on my windows machine. The Exception occurs while submitting it on hadoop cluster using

bin/spark-submit --class <Project.Package>.sparkSolrConnTest /opt/sbt_jars/spark-solr-test-0.0.1-SNAPSHOT.jar --master local

rashid-1989 avatar Sep 27 '18 07:09 rashid-1989

Please check that spark-solr jar is on the classpath or in your shaded jar

kiranchitturi avatar Sep 27 '18 16:09 kiranchitturi

Thanks @kiranchitturi . Got the rid of above exception by adding the spark-solr shaded jar in spark-submit command. However, the Job is not able to locate any of the solr collections and throws org.apache.solr.common.SolrException: Collection not found exception.

I am able to read data from theses collections while executing the code from eclipse but not through spark-submit.

rashid-1989 avatar Sep 28 '18 07:09 rashid-1989

Please check if you have the right zkhost and check the logs

kiranchitturi avatar Sep 28 '18 15:09 kiranchitturi

csvDF.write.format("solr").options(options).mode(org.apache.spark.sql.SaveMode.Overwrite).save java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:347) at scala.None$.get(Option.scala:345) at com.lucidworks.spark.SolrRelation.insert(SolrRelation.scala:634) at solr.DefaultSource.createRelation(DefaultSource.scala:27) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) ... 49 elided

uttamraj9 avatar Aug 04 '19 12:08 uttamraj9

I am getting same as @rashid-1989. spark-submit's master is yarn in cluster mode. Spark : 2.2.0.cloudera3 with com.lucidworks.spark:spark-solr:jar:3.3.4 1st microbatch works perfectly and writes into solr but then classNotFound from second microbatch in driver.

Hi Kiran, if you have it resolved please let me know what your solution was.

My action was forEachRDD(x -> .....write().format("solr").....)

verbose:class shows that solr.DefaultSource was loaded from my app's uber jar (solr.DefaultSource is available in it. I decompiled and confirmed that nothings wrong with class)

_

java.lang.ClassNotFoundException: Failed to find data source: solr. Please find packages at http://spark.apache.org/third-party-projects.html
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:546)
        at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:87)
        at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:87)
        at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:467)
        at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
        at com.app1.app3.app4.app5.app6.writeToSolr(app6.java:236)
        at com.app1.app3.app4.app5.app6.toDF(app6.java:119)
        at com.app1.app3.app4.app5.app6.app7.execute(app7.java:275)
        at com.app1.app3.app4.app5.app6.execute(app6.java:86)
        at com.app1.app3.spark.app1.OSF.execute(OSF.java:176)
        at com.app1.app3.spark.app3.OSF.call(OSF.java:121)
        at com.app1.app3.spark.app2.OSF.call(OSF.java:59)
        at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280)
        at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: solr.DefaultSource
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$22$$anonfun$apply$14.apply(DataSource.scala:530)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$22$$anonfun$apply$14.apply(DataSource.scala:530)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$22.apply(DataSource.scala:530)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$22.apply(DataSource.scala:530)
        at scala.util.Try.orElse(Try.scala:84)
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:530)
        ... 43 more

_

nitinangadi avatar Sep 07 '19 12:09 nitinangadi

Make sure the jar is present on the driver classpath. Check the logs for driver classpath

kiranchitturi avatar Mar 15 '20 05:03 kiranchitturi

Kiran, I'm trying to run a PySpark example and getting the same issue.

from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Python basic example") \
    .getOrCreate()
    

df = spark.read.format("solr").option("collection", "system_history").load()
print("No. of docs in logs collection {}".format(df.count()))
spark.stop()
spark.sparkContext._jvm.java.lang.System.exit(0)

ERROR


  File "/home/steph/Projects/fusion-spark-job-workbench/python_examples/count_docs.py", line 9, in <module>
    df = spark.read.format("solr").option("collection", "system_history").load()
  File "/home/steph/Projects/fusion/4.2.6/apps/spark-dist/python/pyspark/sql/readwriter.py", line 172, in load
    return self._df(self._jreader.load())
  File "/home/steph/Projects/fusion/4.2.6/apps/spark-dist/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/home/steph/Projects/fusion/4.2.6/apps/spark-dist/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/home/steph/Projects/fusion/4.2.6/apps/spark-dist/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o32.load.
: java.lang.ClassNotFoundException: Failed to find data source: solr. Please find packages at http://spark.apache.org/third-party-projects.html
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:639)

The example works when I upload it to Fusion, but not when I debug it outside. Which library am I missing?

svanschalkwyk avatar Jul 21 '20 22:07 svanschalkwyk

@svanschalkwyk did you manage to solve this?

phenilb avatar Sep 19 '22 12:09 phenilb

I believe it was a jar which was not installed. Check the classpath and determine which spark jars need to be there.

svanschalkwyk avatar Sep 19 '22 15:09 svanschalkwyk