Add VolcanoWorker to support Volcano Job submission via Kubernetes work pool and test
Apologies for the noise on the earlier PR https://github.com/PrefectHQ/prefect/pull/17880. I am sry While iterating on the Volcano worker I accidentally pushed several unrelated commits that polluted the diff. I’ve started fresh from the latest upstream/main, cherry-picked only the relevant changes, and added a focused unit-test suite.
The previous PR has been closed in favor of this clean one. Changes included here (3 files):
volcanoworker.py – new VolcanoWorker implementation
init.py – registers the new worker type
tests/test_volcano_worker.py – unit & smoke tests covering job creation, pod discovery, and state polling
Please let me know if you’d like additional work
CodSpeed Performance Report
Merging #17909 will not alter performance
Comparing zyn71:feat/volcano_clean (5c10a28) with main (df10951)
Summary
✅ 2 untouched benchmarks
hi,@reneleonhardt @zzstoatzz (we have argued before) Sorry to nudge, but I wanted to gently follow-up on this PR. It’s been open for three days, the checks (CI, pre-commit, dev env) have all passed, and the diff is limited to three files.And it has already been applied into out porduction. If there’s anything I can clarify or adjust, just let me know—I’m happy to turn it around quickly.Thanks a lot for your time and for keeping the review queue moving!
I'm no maintainer sorry. I suggest using a free AI to save contributors and reviewers time: https://www.coderabbit.ai/blog/how-linux-foundation-used-ai-code-reviews-to-reduce-manual-bottlenecks-in-oss
Hey @zyn71, neither @zzstoatzz nor I are very familiar with Volcano, and we'll want to test this functionality before merging. Could you provide instructions on how we can test these changes locally?
Hi @desertaxle — thank you for the quick feedback! Below is a full, single-file guide my team follows to test the new VolcanoWorker locally. Everything has been validated on real clusters;
1 Get server and workpool
Create a conventional prefect server and prefect-kubernetes workpool.So far nothing is Volcano-specific.
2 Convert K8s work pool to “Volcano mode” manully
Open K8s Work Pool → Advanced → Job Manifest and paste the manifest below.
It’s the normal template with apiVersion: batch.volcano.sh/v1alpha1 and Volcano-only fields added:
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
labels: "{{ labels }}"
namespace: "{{ namespace }}"
generateName: "{{ name }}-"
spec:
schedulerName: volcano
queue: your-volcano-queue
maxRetry: 3
minAvailable: 1
tasks:
- name: pod1
replicas: 1
policies:
- event: TaskCompleted
action: CompleteJob
template:
metadata:
labels:
job-name: "{{ name }}"
volcano.sh/job-name: "{{ name }}"
spec:
restartPolicy: Never
containers:
- name: c1
image: "{{ image }}"
command: "{{ command }}"
env: "{{ env }}"
imagePullPolicy: "{{ image_pull_policy }}"
3 Start a VolcanoWorker
Before merging into main branch, we must build a new image including above there changes files.(BASE is the newest version prefect)
docker build -t your_image .
We can uses helm chart to create a volcanoworker
helm pull prefect/prefect-worker # fetch chart
tar -xzf prefect-worker-*.tgz && cd prefect-worker
# edit values.yaml (connect to prefect server by your owns)➜
config:
type: volcano
workPool:"your work pool"
image:
repository: your image
prefectTag: your tag
helm install prefect-worker . -f values.yaml -n your-namespace
No extra image build is needed after merging
4 Deploy & run a flow
Any deployment assigned to my-k8s-pool now spawns a real Volcano Job.
Logs, exit-codes, and state reporting behave exactly like the stock K8s worker.nothing is Volcano-specific
Extra notes
-
Zero intrusion:
VolcanoWorkersubclassesKubernetesWorker; normal paths are untouched unlessapiVersionis Volcano. -
Tests: added
tests/test_volcanoworker.pycovering job creation, pod discovery, and state-watching. - Happy to iterate further — just let me know what else you need!
Thanks again for your time!
Thanks for that guide @zyn71! Do I need to install Volcano in my cluster before using the Volcano worker? Also, I think it'd be best if this worker didn't require manual changes to the base job template in the default case. Would you be able to add _get_default_job_manifest_template similar to the existing KubernetesWorker?
hi again @desertaxle !Yes, Volcano must be installed in the cluster before the VolcanoWorker can be used.
Because this PR has not been merged yet, you also need to build a custom worker image (step 3 in my guide): simply copy volcanoworker.py and the updated __init__.py into the current Prefect package, build the image, and then start the worker with --type volcano.
Regarding your concern about avoiding manual edits to the base job template: Prefect validates the job manifest in several layers, not only inside the worker. Making the worker auto‑switch templates would therefore require extensive changes—perhaps even a brand‑new integration. Considering that our goal is just to extend prefect‑kubernetes with Volcano support, and since VolcanoWorker already inherits from KubernetesWorker, the current solution remains flexible and easy to use.
If there is anything else you need, please let me know. Thank you!
@zyn71 I disagree that it would take extensive updates to support a different default manifest for the Volcano worker. The primary validation of the manifest happens in KubernetesWorkerJobConfiguration._validate_job_manifest, which you are already overriding in this PR. I think it makes more sense to have that validation be Volcano-specific and have a default manifest that allows the Volcano worker to be used without manual configuration.
I think one thing that would help is that the Volcano worker doesn't need to be compatible with vanilla Kubernetes jobs. I think the overall design of the new worker would be greatly improved if you dropped that compatibility.
hi @desertaxle !I totally understand your point—the first thing I tried was exactly what you’re suggesting.
However, the default manifest is injected at the work‑pool level, not at the worker level.
Right now Prefect only ships a single work‑pool type (kubernetes). When you create a
Kubernetes work pool, the UI automatically seeds a Kubernetes Job template; later, the
worker validates whatever manifest lives in that pool.
If we want to eliminate all manual edits, we would first need to introduce a brand‑new Volcano work‑pool type (or extend the existing pool so it can return a Volcano‑flavored template). That’s only part of the story, though:
- During local testing I noticed that validation happens in several places.
In addition to the worker’sJobConfiguration._validate_job_manifest, the flow‑run machinery also validates the manifest that comes from the work pool, and other internal modules touch it as well. - So even with a Volcano work pool, we’d still have to propagate changes across multiple code paths, which feels like a fairly invasive refactor.
Because Volcano can sit on top of standard Kubernetes primitives, my goal was to reuse
the existing kubernetes work‑pool infrastructure and just add a lightweight
VolcanoWorker on top. That avoids a large amount of churn in the core codebase while
still giving users an extra scheduling option.
@zyn71 we have processes in place to add new supported work pools to the UI, so it will be possible to create a Volcano worker from the UI once the Volcano worker is added to prefect-kubernetes. In the meantime, workers can create work pools when they start up, so running prefect worker start --type volcano WORK_POOL_NAME can create a Volcano work pool. That's why I'm insistent that the Volcano worker should have its own default manifest.
I appreciate your desire not to introduce new churn to the codebase, but I don't think you'll introduce much churn since you're adding something new. If the Volcano worker is going to share code with the Kubernetes worker, I'd like them to be as loosely coupled as possible. One way you can do that is by ensuring the Volcano worker is only responsible for running Volcano jobs.
Hi @desertaxle! Thanks a lot for the suggestions and guidance. I’ve now added a _get_default_volcano_job_manifest_template() to VolcanoWorker, so we have anative Volcano base manifest.
Following your advice, once the UI can create a Volcano work‑pool (with itsown default job manifest), this worker will be much more self‑contained andloosely coupled with the existing Kubernetes worker.
Could you take a look and let me know if this approach works for you? I’m happy to make further adjustments if needed. Thanks!
Hey @zyn71, we made some pretty significant changes to the Kubernetes worker in https://github.com/PrefectHQ/prefect/pull/18004 that will require changes to the worker you're working on here.
The big change is splitting event emission and crash detection responsibilities into a separate observer that starts alongside the worker. You might be able to reuse the observer, but I'm not sure if the jobs event handler will work with Volanco jobs. Let me know if you have any questions on the new structure!
hi! @desertaxle,Sorry for the delayed response - I've been quite busy lately and this took longer than expected. First of all, I want to say that the refactoring work you did in #18004 is excellent! It addresses many issues and significantly improves the reliability and architecture of the Kubernetes worker. I've successfully adapted my VolcanoWorker to work with the new structure. The good news is that I was able to reuse your observer implementation without any modifications - it works perfectly with Volcano jobs! Here's what I did to make it compatible:
-
Removed old monitoring logic: Eliminated the custom
_watch_job(),_get_job(), and event replication code since the observer now handles all of this. -
Simplified the
run()method: Now it only creates the job and returns immediately, letting the observer handle monitoring. -
Fixed Pod label propagation: This was the key part - I updated the
_propagate_labels_to_pod()method to correctly handle Volcano job's pod template structure (spec.tasks[0].template.metadata.labelsinstead ofspec.template.metadata.labels), ensuring thatprefect.io/flow-run-idlabels are properly propagated to pods so the observer can identify them. -
Updated job manifest template: Made sure the Volcano job template includes
{{ labels }}placeholders in the pod template. I've tested the implementation and confirmed that:
- The observer correctly identifies and monitors Volcano job pods
- Event replication works as expected
- Crash detection functions properly
- Both standard Kubernetes jobs and Volcano jobs work seamlessly Thanks again for the great work on the refactoring - it makes the codebase much cleaner and more robust!
Hi! Our MR hasn’t made any progress in a week. Do you have any suggestions, or is there anything else I can do to move it forward or improve it?
hi @zyn71 - sorry for the delay here. desertaxle is out on leave right now. one of us will review your updates as soon as possible
This pull request is stale because it has been open 14 days with no activity. To keep this pull request open remove stale label or comment.
This pull request was closed because it has been stale for 14 days with no activity. If this pull request is important or you have more to add feel free to re-open it.
hi,its been a long time.can i do sth to keep the pr forward?