OpenMetadata
OpenMetadata copied to clipboard
sparks.driver.extraClassPath is not a valid Spark property
Affected module Database Services -> Delta Lake (Hive Metadata Database)
Describe the bug
jdbcDriverClassPath won't be loaded because sparks.driver.extraClassPath is not a valid Spark property.
Actually, it may be spark.driver.extraClassPath.
To Reproduce Please run following command in ingest container of docker-compose.yml
metadata ingestion -c delta_lake_ingest.yml
source:
type: deltalake
serviceName: "test"
serviceConnection:
config:
type: DeltaLake
metastoreConnection:
metastoreDb: jdbc:mysql://172.16.240.1:17306/hive_metastore?createDatabaseIfNotExist=true&useSSL=false
username: xxxx
password: xxxx
driverName: com.mysql.cj.jdbc.Driver
jdbcDriverClassPath: /opt/spark-3.5.0-bin-hadoop3/jars/mysql-connector-java-8.0.11.jar
appName: OpenMetadata
sourceConfig:
config:
type: DatabaseMetadata
markDeletedTables: true
includeTables: true
includeViews: true
sink:
type: metadata-rest
config: {}
workflowConfig:
loggerLevel: DEBUG
openMetadataServerConfig:
hostPort: "http://openmetadata-server:18085/api"
authProvider: openmetadata
securityConfig:
jwtToken: "xxxx"
storeServiceConnection: true
Expected behavior finish normally.
Actual behavior
Got Warning: Ignoring non-Spark config property: sparks.driver.extraClassPath and The specified datastore driver ("com.mysql.cj.jdbc.Driver") was not found
log
airflow@30ec1bf0bc33:/opt/airflow$ metadata ingest -c delta_lake_ingest.yml
[2024-04-29 15:25:32] DEBUG {metadata.Utils:execution_time_tracker:124} - GET executed in 0.01s
[2024-04-29 15:25:32] INFO {metadata.OMetaAPI:server_mixin:66} - OpenMetadata client running with Server version [1.3.1] and Client version [1.3.1.3]
[2024-04-29 15:25:32] DEBUG {metadata.Utils:execution_time_tracker:124} - GET executed in 0.0s
/home/airflow/.local/lib/python3.10/site-packages/pyspark/bin/load-spark-env.sh: line 68: ps: command not found
Warning: Ignoring non-Spark config property: sparks.driver.extraClassPath
:: loading settings :: url = jar:file:/home/airflow/.local/lib/python3.10/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/airflow/.ivy2/cache
The jars for the packages stored in: /home/airflow/.ivy2/jars
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-c2ede5d7-c832-49c7-b7dc-33cc9c326679;1.0
confs: [default]
found io.delta#delta-core_2.12;2.3.0 in central
found io.delta#delta-storage;2.3.0 in central
found org.antlr#antlr4-runtime;4.8 in central
:: resolution report :: resolve 193ms :: artifacts dl 9ms
:: modules in use:
io.delta#delta-core_2.12;2.3.0 from central in [default]
io.delta#delta-storage;2.3.0 from central in [default]
org.antlr#antlr4-runtime;4.8 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 3 | 0 | 0 | 0 || 3 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-c2ede5d7-c832-49c7-b7dc-33cc9c326679
confs: [default]
0 artifacts copied, 3 already retrieved (0kB/6ms)
24/04/29 15:25:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[2024-04-29 15:25:37] DEBUG {metadata.Utils:execution_time_tracker:124} - GET executed in 0.02s
24/04/29 15:25:38 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
24/04/29 15:25:38 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
24/04/29 15:25:39 ERROR Datastore: Exception thrown creating StoreManager. See the nested exception
Error creating transactional connection factory
org.datanucleus.exceptions.NucleusException: Error creating transactional connection factory
at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:214)
at org.datanucleus.store.AbstractStoreManager.<init>(AbstractStoreManager.java:162)
at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:285)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:606)
at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at org.datanucleus.NucleusContextHelper.createStoreManagerForProperties(NucleusContextHelper.java:133)
at org.datanucleus.PersistenceNucleusContextImpl.initialise(PersistenceNucleusContextImpl.java:422)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:817)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:334)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:213)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at javax.jdo.JDOHelper$16.run(JDOHelper.java:1975)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.invoke(JDOHelper.java:1970)
at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1177)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:814)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:702)
at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:521)
at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:550)
at org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:405)
at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342)
at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:79)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:139)
at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:162)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1740)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3607)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3659)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3639)
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1563)
at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1552)
at org.apache.spark.sql.hive.client.Shim_v0_12.databaseExists(HiveShim.scala:609)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$databaseExists$1(HiveClientImpl.scala:398)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:298)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:229)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:228)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:278)
at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:398)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:223)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:223)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:69)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:121)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:121)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:294)
at org.apache.spark.sql.internal.CatalogImpl.listDatabases(CatalogImpl.scala:74)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:606)
at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:330)
at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:203)
... 93 more
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.cj.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
Version:
- OS: Ubuntu 22.04 (host os)
- Python version: 3.10.13
- OpenMetadata version: 1.3.1
- OpenMetadata Ingestion package version: 1.3.1.3
Additional context I firstly tried to connect delta lake by UI settings. Then, the UI returned only following error messages. If it have provided some stack traces, I would be happy :)
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Related another bug:
This should be connection.metastoreConnection.username,, not connection.metastoreConnection.metastoreDB
https://github.com/open-metadata/OpenMetadata/blob/58992c2e24a380af7139666e3778590e53a47ea6/ingestion/src/metadata/ingestion/source/database/deltalake/connection.py#L74
@yush1ga thanks for popping the issue and providing the fix. Really appreciated