spark-operator icon indicating copy to clipboard operation
spark-operator copied to clipboard

feat: add gRPC Spark submitter plugin + proto (optional plugin for spark-submit via gRPC)

Open rahul810050 opened this issue 1 month ago • 1 comments

Summary

This PR adds a configurable gRPC plugin for submitting Spark applications (instead of invoking ${SPARK_HOME}/bin/spark-submit). Submitting via gRPC reduces per-submission overhead (no new process per submission) and enables improved controller throughput when reconciling many new SparkApplications.

What this PR adds / changes

  • proto/sparksubmit/spark_submit.proto

    • Proto service definition for a SparkSubmitService and messages for SubmitRequest/SubmitResponse.
  • Generated protobuf stubs (under proto/sparksubmit/):

    • spark_submit.pb.go
    • spark_submit_grpc.pb.go
  • internal/controller/sparkapplication/grpc_submitter.go

    • New GRPCSubmitter implementing the existing SparkApplicationSubmitter interface.
    • Uses the generated gRPC client to call a remote Spark submit service.
    • Fallback remains the existing SparkSubmitter which runs spark-submit locally.
  • internal/controller/sparkapplication/grpc_submitter_test.go

    • Unit tests for the gRPC submitter (server stubbed locally in tests).
    • Verifies expected request formation and error handling.
  • cmd/operator/controller/start.go

    • Wire a new controller flag and pass controller-wide option through to scheduled spark application reconciler options (--scheduled-sa-timestamp-precision).
    • (Helm chart and values updated to provide default)
  • charts/spark-operator-chart/templates/controller/deployment.yaml

    • Map the new CLI argument to the container args.
  • charts/spark-operator-chart/values.yaml

    • Default values for the controller option controller.scheduledSparkApplication.timestampPrecision.
  • Makefile

    • Proto generation targets and guidance; proto-gen target uses protoc and protoc-gen-go/protoc-gen-go-grpc.
  • Tests

    • Unit tests updated/added. Local test runs show controller-level tests passing for the modified packages.

Behavior & Backwards compatibility

  • Default behavior is unchanged: controller uses the existing SparkSubmitter which runs spark-submit.
  • The gRPC submitter is a new implementer of SparkApplicationSubmitter. It can be used by configuring the controller to use it (future config/flag or DI).
  • Timestamp precision behavior for ScheduledSparkApplication is configurable via:
    • CLI flag: --scheduled-sa-timestamp-precision
    • Chart value: controller.scheduledSparkApplication.timestampPrecision
    • Default remains nanos.

How to generate proto stubs locally

From repo root:

# ensure bin dir for plugin binaries
export GOBIN=$(pwd)/bin
mkdir -p "$GOBIN"
export PATH="$GOBIN:$PATH"

# install protoc plugins (one-time)
go install google.golang.org/protobuf/cmd/[email protected]
go install google.golang.org/grpc/cmd/[email protected]

# run protoc (proto source is in proto/sparksubmit/)
protoc -I proto \
  --go_out=proto --go_opt=paths=source_relative \
  --go-grpc_out=proto --go-grpc_opt=paths=source_relative \
  proto/sparksubmit/spark_submit.proto

Files changed

Added:

  • proto/sparksubmit/spark_submit.proto
  • proto/sparksubmit/spark_submit.pb.go (generated)
  • proto/sparksubmit/spark_submit_grpc.pb.go (generated)
  • internal/controller/sparkapplication/grpc_submitter.go
  • internal/controller/sparkapplication/grpc_submitter_test.go

Modified:

  • Makefile (proto-gen bits)
  • cmd/operator/controller/start.go
  • charts/spark-operator-chart/templates/controller/deployment.yaml
  • charts/spark-operator-chart/values.yaml
  • go.mod (if needed for generated packages)
  • internal/controller/scheduledsparkapplication/controller_test.go (tests)

Fixed Issue

#2746

Testing and status

  • Unit tests for internal/controller/sparkapplication pass locally in my environment.
  • protoc and go-based codegen used to generate stubs; go test passes for the updated packages.

Checklist (for reviewers)

  • [ ] API (proto) looks reasonable (field names, message types).
  • [ ] gRPC client usage is robust (timeouts, retries, error handling).
  • [ ] Unit tests are sufficient to cover main code paths.
  • [ ] Helm chart / CLI wiring is correct and default values sensible.
  • [ ] Backwards compatibility verified (default uses existing submitter).

rahul810050 avatar Nov 22 '25 15:11 rahul810050

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign jacobsalway for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Nov 22 '25 15:11 google-oss-prow[bot]

Hii @ChenYi015 could you please review it whenever you get a chance ??

rahul810050 avatar Dec 12 '25 04:12 rahul810050

Changes from #2742 has been included in this PR and some other PRs you have created, please do not mix up changes from different PRs.

ChenYi015 avatar Dec 12 '25 09:12 ChenYi015

We also need to discuss the implementation details and potential ramifications. Let's discuss in the issue before jumping into implementing it.

nabuskey avatar Dec 12 '25 12:12 nabuskey

Changes from #2742 has been included in this PR and some other PRs you have created, please do not mix up changes from different PRs.

Yeah, I’ve been waiting for that PR to get merged, but it’s taking quite a long time. I tried rebasing and ran all the necessary commands to fix it, but it’s still pulling in commits from both branches. What should I do? Please help me out!!!

rahul810050 avatar Dec 13 '25 05:12 rahul810050

We also need to discuss the implementation details and potential ramifications. Let's discuss in the issue before jumping into implementing it.

Sure @nabuskey !! I’ll keep that in mind and make sure to discuss the implementation details in the issue before proceeding. If you prefer I can close this PR and we can first discuss the idea in the issue thread. Once we reach a conclusion, I’ll open a new PR based on that discussion. Does that sound good?

rahul810050 avatar Dec 13 '25 05:12 rahul810050