[FEATURE] Support SparkSQL on Standalone cluster
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Search before asking
- [X] I have searched in the issues and found no similar issues.
Describe the feature
Kyuubi currently provides an excellent interface for running Spark SQL on distributed Spark environments. However, it lacks official support for running SparkSQL on a standalone Spark cluster. This feature request proposes adding support for SparkSQL execution on Spark Standalone clusters, allowing users to deploy Kyuubi with more flexibility in Spark cluster configurations.
Motivation
No response
Describe the solution
No response
Additional context
No response
Are you willing to submit PR?
- [ ] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
- [X] No. I cannot submit a PR at this time.
Technically, Standalone clusters are supported. What was the issue you actually encountered?
Hi @yaooqinn , Apache Spark Kubernetes Operator supports deploying a Spark Standalone cluster on Kubernetes. However, I have come across an issue where Kyuubi has not yet implemented a Spark Standalone ApplicationOperation. Therefore, I created this feature request.
AFAIK, ApplicationOperation is irrelevant to submit engines
That's good to know. However, I just can't seem to configure Apache Kyuubi properly to work with my standalone Spark cluster.
I can't, for the life of me, figure out how to properly set the env vars like:
- JAVA_HOME
- SPARK_HOME
- SPARK_ENGINE_HOME
- HIVE_HOME
- HADOOP_CONF_DIR
Should these paths refer to Kyuubi's container filesystem or the Spark worker’s?
I’m using:
- bitnami/spark:3.5 for Spark
- apache/kyuubi:1.9.4-all for Kyuubi
If I use the built-in spark binaries in the Kyuubi's image, then I am able to establish a connection, but if I set spark.master to point to my Spark cluster's master node, I start getting all kinds of problems.
Right now I'm stuck at ...org.apache.kyuubi.KyuubiSQLException: org.apache.kyuubi.KyuubiSQLException: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf.
Any help would be really appreciated!
@yaooqinn Would starting a discussion be the right way to ask for help, or is it okay to post it here?
looks like missing hive jars?
Yes, it definitely seems like it, I just assumed the official Kyuubi image would have every necessary jar installed by default. Sorry if it's a silly question, but the jars would be missing in my Kyuubi's environment, not my Spark environment, right? I configured Kyuubi to submit jobs in client mode.
Yes, Kyuubi has to know at least SPARK_HOME to submit Spark apps. Unfortunately, the official Kyuubi image is not my area, I'm not sure whether it's bundled with Spark releases, and set with SPARK_HOME correctly.
Do you know where I could find help? I thought this repository was the right place...