kyuubi [FEATURE] Support SparkSQL on Standalone cluster

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Search before asking

[X] I have searched in the issues and found no similar issues.

Describe the feature

Kyuubi currently provides an excellent interface for running Spark SQL on distributed Spark environments. However, it lacks official support for running SparkSQL on a standalone Spark cluster. This feature request proposes adding support for SparkSQL execution on Spark Standalone clusters, allowing users to deploy Kyuubi with more flexibility in Spark cluster configurations.

Motivation

No response

Describe the solution

No response

Additional context

No response

Are you willing to submit PR?

[ ] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
[X] No. I cannot submit a PR at this time.

Sep 26 '24 04:09 nqvuong1998

Technically, Standalone clusters are supported. What was the issue you actually encountered?

Sep 26 '24 05:09 yaooqinn

Hi @yaooqinn , Apache Spark Kubernetes Operator supports deploying a Spark Standalone cluster on Kubernetes. However, I have come across an issue where Kyuubi has not yet implemented a Spark Standalone ApplicationOperation. Therefore, I created this feature request.

Sep 26 '24 05:09 nqvuong1998

AFAIK, ApplicationOperation is irrelevant to submit engines

Sep 26 '24 07:09 yaooqinn

That's good to know. However, I just can't seem to configure Apache Kyuubi properly to work with my standalone Spark cluster.

I can't, for the life of me, figure out how to properly set the env vars like:

JAVA_HOME
SPARK_HOME
SPARK_ENGINE_HOME
HIVE_HOME
HADOOP_CONF_DIR

Should these paths refer to Kyuubi's container filesystem or the Spark worker’s?

I’m using:

bitnami/spark:3.5 for Spark
apache/kyuubi:1.9.4-all for Kyuubi

If I use the built-in spark binaries in the Kyuubi's image, then I am able to establish a connection, but if I set spark.master to point to my Spark cluster's master node, I start getting all kinds of problems.

Right now I'm stuck at ...org.apache.kyuubi.KyuubiSQLException: org.apache.kyuubi.KyuubiSQLException: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf.

Any help would be really appreciated!

Jul 04 '25 09:07 medina325

@yaooqinn Would starting a discussion be the right way to ask for help, or is it okay to post it here?

Jul 04 '25 09:07 medina325

looks like missing hive jars？

Jul 04 '25 12:07 yaooqinn

Yes, it definitely seems like it, I just assumed the official Kyuubi image would have every necessary jar installed by default. Sorry if it's a silly question, but the jars would be missing in my Kyuubi's environment, not my Spark environment, right? I configured Kyuubi to submit jobs in client mode.

Jul 04 '25 12:07 medina325

Yes, Kyuubi has to know at least SPARK_HOME to submit Spark apps. Unfortunately, the official Kyuubi image is not my area, I'm not sure whether it's bundled with Spark releases, and set with SPARK_HOME correctly.

Jul 07 '25 02:07 yaooqinn

Do you know where I could find help? I thought this repository was the right place...

Jul 07 '25 12:07 medina325