spark-atlas-connector
spark-atlas-connector copied to clipboard
java.lang.NoClassDefFoundError: org/apache/atlas/ApplicationProperties
spark version 2.4.5
atlas version 2.0.0
use maven execute
mvn package -DskipTests
successful!!
this screenshot
copy 1100-spark_model.json to <ATLAS_HOME>/models/1000-Hadoop
execute
spark-shell --jars spark-atlas-connector_2.11-0.1.0-SNAPSHOT.jar
--conf spark.extraListeners = com.hortonworks.spark.atlas.SparkAtlasEventTracker
--conf spark.sql.queryExecutionListeners = com.hortonworks.spark.atlas.SparkAtlasEventTracker
--conf spark.sql.streaming.streamingQueryListeners = com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker
but I compiled successfully! why it says java.lang.NoClassDefFoundError: org/apache/atlas/ApplicationProperties what can I do?
Looks like you missed supplying the application properties file.
any updates on this?
This is due to a missing jar package. You can use the /spark-atlas-connector-assembly/target directory of the
spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar
But I'm having trouble with this one.
java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String at com.hortonworks.spark.atlas.AtlasClientConf.get(AtlasClientConf.scala:50) at com.hortonworks.spark.atlas.types.AtlasEntityUtils$class.clusterName(AtlasEntityUtils.scala:29) at com.hortonworks.spark.atlas.sql.CommandsHarvester$.clusterName(CommandsHarvester.scala:45) at com.hortonworks.spark.atlas.types.AtlasEntityUtils$class.tableToEntity(AtlasEntityUtils.scala:60) at com.hortonworks.spark.atlas.sql.CommandsHarvester$.tableToEntity(CommandsHarvester.scala:45) at com.hortonworks.spark.atlas.sql.CommandsHarvester$InsertIntoHiveTableHarvester$.harvest(CommandsHarvester.scala:56) at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:126) at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:89) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:89) at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:63) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:72) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:71) at scala.Option.foreach(Option.scala:257) at com.hortonworks.spark.atlas.AbstractEventProcessor.eventProcess(AbstractEventProcessor.scala:71) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anon$1.run(AbstractEventProcessor.scala:38)
Looks like you missed supplying the application properties file.
This is partially correct. As per the readme, the atlas-application.properties
needs to be discoverable by spark. ie - needs to be in classpath (if cluster mode, use --files
to ship to executor).
You also need to either
- provide the apache atlas jars (atlas-intg) to the spark submit (as well as many other jar dependencies)
- use the fat jar under
spark-atlas-connector-assembly/target
NOTE: I am trying to make this work in Azure Databricks, which requires an init script.
I am only using the RestAtlasClient.scala
. This leverages AtlasClientConf.scala
which uses ApplicationProperties.java
Take a look at the ApplicationProperties.java
in atlas repo.
You can see that if ATLAS_CONFIGURATION_DIRECTORY_PROPERTY == null
then it will search under the classpath using ApplicationProperties.class.getClassLoader()
which seems to be completely useless because that falls under the webapp section of Atlas.
So that means there's an assumption that spark workloads are running on the same VM as atlas web app? This is unclear to me.
If you look at the static variable of the ApplicationProperties class, you can see that ATLAS_CONFIGURATION_DIRECTORY_PROPERTY
is set to java system property "atlas.conf"
. This stackoverflow post has the comment showing that if you set System.setProperty("atlas.conf", "<path to your properties>")
in your spark job, then it will work.
Spark Conf
extra class path (not working)
I've tried setting the following spark conf options during spark-submit
:
-
--conf "spark.driver.extraClassPath=path/to/properties-folder/*"
-
--conf "spark.executor.extraClassPath=path/to/properties-folder/*"
I tried multiple variations of folder paths, using the name of the file, not using the name of the file, using local:/folderpath, etc.
This does not work.
Log output:
21/03/30 18:54:46 INFO ApplicationProperties: Looking for atlas-application.properties in classpath
21/03/30 18:54:46 INFO ApplicationProperties: Looking for /atlas-application.properties in classpath
21/03/30 18:54:46 INFO ApplicationProperties: Loading atlas-application.properties from null
Summarized error:
ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Exception when registering SparkListener
...
Caused by: org.apache.atlas.AtlasException: Failed to load application properties
...
Caused by: org.apache.commons.configuration.ConfigurationException: Cannot locate configuration source null
We can see that the url
variable is null.
extra java options (working)
I then tried setting Java system properties. specifically atlas.conf
. There are 2 ways to do this:
- using
spark-defaults.conf
. The default Spark properties file is$SPARK_HOME/conf/spark-defaults.conf
-
--conf "spark.driver.extraJavaOptions=-Datlas.conf=path/to/properties-folder/" --conf "spark.executor.extraJavaOptions=-Datlas.conf=path/to/properties-folder/"
I opted for using --conf
which worked successfully.
~~Modified source code~~
~~I ended up setting the System property (tied to environment variable) within the class constructor of AtlasClientConf
and object AtlasClienctConf
~~
This didn't work either. Setting Java system parameters in Spark conf is the solution.