kyuubi icon indicating copy to clipboard operation
kyuubi copied to clipboard

[FEATURE] Support SparkSQL on Standalone cluster

Open nqvuong1998 opened this issue 1 year ago • 9 comments

Code of Conduct

Search before asking

  • [X] I have searched in the issues and found no similar issues.

Describe the feature

Kyuubi currently provides an excellent interface for running Spark SQL on distributed Spark environments. However, it lacks official support for running SparkSQL on a standalone Spark cluster. This feature request proposes adding support for SparkSQL execution on Spark Standalone clusters, allowing users to deploy Kyuubi with more flexibility in Spark cluster configurations.

Motivation

No response

Describe the solution

No response

Additional context

No response

Are you willing to submit PR?

  • [ ] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
  • [X] No. I cannot submit a PR at this time.

nqvuong1998 avatar Sep 26 '24 04:09 nqvuong1998

Technically, Standalone clusters are supported. What was the issue you actually encountered?

yaooqinn avatar Sep 26 '24 05:09 yaooqinn

Hi @yaooqinn , Apache Spark Kubernetes Operator supports deploying a Spark Standalone cluster on Kubernetes. However, I have come across an issue where Kyuubi has not yet implemented a Spark Standalone ApplicationOperation. Therefore, I created this feature request.

nqvuong1998 avatar Sep 26 '24 05:09 nqvuong1998

AFAIK, ApplicationOperation is irrelevant to submit engines

yaooqinn avatar Sep 26 '24 07:09 yaooqinn

That's good to know. However, I just can't seem to configure Apache Kyuubi properly to work with my standalone Spark cluster.

I can't, for the life of me, figure out how to properly set the env vars like:

  • JAVA_HOME
  • SPARK_HOME
  • SPARK_ENGINE_HOME
  • HIVE_HOME
  • HADOOP_CONF_DIR

Should these paths refer to Kyuubi's container filesystem or the Spark worker’s?

I’m using:

  • bitnami/spark:3.5 for Spark
  • apache/kyuubi:1.9.4-all for Kyuubi

If I use the built-in spark binaries in the Kyuubi's image, then I am able to establish a connection, but if I set spark.master to point to my Spark cluster's master node, I start getting all kinds of problems.

Right now I'm stuck at ...org.apache.kyuubi.KyuubiSQLException: org.apache.kyuubi.KyuubiSQLException: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf.

Any help would be really appreciated!

medina325 avatar Jul 04 '25 09:07 medina325

@yaooqinn Would starting a discussion be the right way to ask for help, or is it okay to post it here?

medina325 avatar Jul 04 '25 09:07 medina325

looks like missing hive jars?

yaooqinn avatar Jul 04 '25 12:07 yaooqinn

Yes, it definitely seems like it, I just assumed the official Kyuubi image would have every necessary jar installed by default. Sorry if it's a silly question, but the jars would be missing in my Kyuubi's environment, not my Spark environment, right? I configured Kyuubi to submit jobs in client mode.

medina325 avatar Jul 04 '25 12:07 medina325

Yes, Kyuubi has to know at least SPARK_HOME to submit Spark apps. Unfortunately, the official Kyuubi image is not my area, I'm not sure whether it's bundled with Spark releases, and set with SPARK_HOME correctly.

yaooqinn avatar Jul 07 '25 02:07 yaooqinn

Do you know where I could find help? I thought this repository was the right place...

medina325 avatar Jul 07 '25 12:07 medina325