spark-atlas-connector icon indicating copy to clipboard operation
spark-atlas-connector copied to clipboard

java.lang.NoClassDefFoundError: org/apache/atlas/ApplicationProperties

Open terriblegirl opened this issue 4 years ago • 4 comments

spark version 2.4.5 atlas version 2.0.0 use maven execute mvn package -DskipTests successful!! this screenshot image copy 1100-spark_model.json to <ATLAS_HOME>/models/1000-Hadoop

execute
spark-shell --jars spark-atlas-connector_2.11-0.1.0-SNAPSHOT.jar
--conf spark.extraListeners = com.hortonworks.spark.atlas.SparkAtlasEventTracker
--conf spark.sql.queryExecutionListeners = com.hortonworks.spark.atlas.SparkAtlasEventTracker
--conf spark.sql.streaming.streamingQueryListeners = com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker

image but I compiled successfully! why it says java.lang.NoClassDefFoundError: org/apache/atlas/ApplicationProperties what can I do?

terriblegirl avatar May 27 '20 06:05 terriblegirl

Looks like you missed supplying the application properties file.

shivsood avatar Jul 31 '20 20:07 shivsood

any updates on this?

dhineshns avatar Nov 25 '20 22:11 dhineshns

This is due to a missing jar package. You can use the /spark-atlas-connector-assembly/target directory of the spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar But I'm having trouble with this one. java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String at com.hortonworks.spark.atlas.AtlasClientConf.get(AtlasClientConf.scala:50) at com.hortonworks.spark.atlas.types.AtlasEntityUtils$class.clusterName(AtlasEntityUtils.scala:29) at com.hortonworks.spark.atlas.sql.CommandsHarvester$.clusterName(CommandsHarvester.scala:45) at com.hortonworks.spark.atlas.types.AtlasEntityUtils$class.tableToEntity(AtlasEntityUtils.scala:60) at com.hortonworks.spark.atlas.sql.CommandsHarvester$.tableToEntity(CommandsHarvester.scala:45) at com.hortonworks.spark.atlas.sql.CommandsHarvester$InsertIntoHiveTableHarvester$.harvest(CommandsHarvester.scala:56) at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:126) at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:89) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:89) at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:63) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:72) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:71) at scala.Option.foreach(Option.scala:257) at com.hortonworks.spark.atlas.AbstractEventProcessor.eventProcess(AbstractEventProcessor.scala:71) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anon$1.run(AbstractEventProcessor.scala:38)

YanXiangSong avatar Dec 02 '20 05:12 YanXiangSong

Looks like you missed supplying the application properties file.

This is partially correct. As per the readme, the atlas-application.properties needs to be discoverable by spark. ie - needs to be in classpath (if cluster mode, use --files to ship to executor).

You also need to either

  1. provide the apache atlas jars (atlas-intg) to the spark submit (as well as many other jar dependencies)
  2. use the fat jar under spark-atlas-connector-assembly/target

NOTE: I am trying to make this work in Azure Databricks, which requires an init script.

I am only using the RestAtlasClient.scala. This leverages AtlasClientConf.scala which uses ApplicationProperties.java Take a look at the ApplicationProperties.java in atlas repo. You can see that if ATLAS_CONFIGURATION_DIRECTORY_PROPERTY == null then it will search under the classpath using ApplicationProperties.class.getClassLoader() which seems to be completely useless because that falls under the webapp section of Atlas.
So that means there's an assumption that spark workloads are running on the same VM as atlas web app? This is unclear to me.

If you look at the static variable of the ApplicationProperties class, you can see that ATLAS_CONFIGURATION_DIRECTORY_PROPERTY is set to java system property "atlas.conf". This stackoverflow post has the comment showing that if you set System.setProperty("atlas.conf", "<path to your properties>") in your spark job, then it will work.

Spark Conf

extra class path (not working)

I've tried setting the following spark conf options during spark-submit:

  • --conf "spark.driver.extraClassPath=path/to/properties-folder/*"
  • --conf "spark.executor.extraClassPath=path/to/properties-folder/*"

I tried multiple variations of folder paths, using the name of the file, not using the name of the file, using local:/folderpath, etc.
This does not work.
Log output:

21/03/30 18:54:46 INFO ApplicationProperties: Looking for atlas-application.properties in classpath
21/03/30 18:54:46 INFO ApplicationProperties: Looking for /atlas-application.properties in classpath
21/03/30 18:54:46 INFO ApplicationProperties: Loading atlas-application.properties from null

Summarized error:

ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Exception when registering SparkListener
...
Caused by: org.apache.atlas.AtlasException: Failed to load application properties
...
Caused by: org.apache.commons.configuration.ConfigurationException: Cannot locate configuration source null

We can see that the url variable is null.

extra java options (working)

I then tried setting Java system properties. specifically atlas.conf. There are 2 ways to do this:

  1. using spark-defaults.conf. The default Spark properties file is $SPARK_HOME/conf/spark-defaults.conf
  2.  --conf "spark.driver.extraJavaOptions=-Datlas.conf=path/to/properties-folder/" 
     --conf "spark.executor.extraJavaOptions=-Datlas.conf=path/to/properties-folder/"
    

I opted for using --conf which worked successfully.

~~Modified source code~~

~~I ended up setting the System property (tied to environment variable) within the class constructor of AtlasClientConf and object AtlasClienctConf~~ This didn't work either. Setting Java system parameters in Spark conf is the solution.

kennydataml avatar Mar 30 '21 18:03 kennydataml