flyte icon indicating copy to clipboard operation
flyte copied to clipboard

Support Volcano scheduler integration by enabling PodGroup creation for podTask in Propeller

Open zhuo-zhi opened this issue 7 months ago • 13 comments

Why are the changes needed?

Support Volcano scheduler integration by enabling PodGroup creation for podTask in Propeller. Volcano is a Kubernetes-native batch scheduler optimized for high-performance, AI/ML, and big data workloads.

Before this change, all podTasks associated with a FlyteWorkflow shared a single PodGroup, which was automatically created just once by the Volcano controller when the first podTask was launched. Subsequent podTasks reused this PodGroup, which introduced limitations. Specifically, different podTasks could not be scheduled with different Volcano queues or job priorities, since the shared PodGroup enforced a single queue and priority for all.

This change enables Propeller to create individual Volcano PodGroups for each podTask, allowing them to specify distinct queues and job priorities for scheduling.

What changes were proposed in this pull request?

  1. The plugin manager for the k8s pod plugin creates Volcano PodGroups before creating the corresponding pods for podTasks. The PodGroup will have the same name as the pod, ensuring unique mapping between PodGroup and Pod. It also adds the PodGroup name to the pod’s annotations, ensuring that the Volcano PodGroup controller does not create the PodGroup again.
  2. Each PodGroup inherits the queue and priority from its Pod.
  3. Introduced a feature flag enable-create-pod-group-for-pod in the config to control whether this feature is enabled. The default value is false to ensure backward compatibility.
  4. Regarding PodGroup garbage collection, the PodGroup is deleted by Kubernetes' GC. Since the PodGroup's owner reference points to a FlyteWorkflow, the deletion process follows a cascading effect. When the FlyteWorkflow is deleted, Kubernetes GC removes the associated PodGroup due to the absence of a living owner reference.

How was this patch tested?

Unit tests added Used internally at Linkedin.

Labels

Please add one or more of the following labels to categorize your PR:

  • added: For new features.
  • changed: For changes in existing functionality.
  • deprecated: For soon-to-be-removed features.
  • removed: For features being removed.
  • fixed: For any bug fixed.
  • security: In case of vulnerabilities

This is important to improve the readability of release notes.

Setup process

Screenshots

Check all the applicable boxes

  • [x] I updated the documentation accordingly.
  • [x] All new and existing tests passed.
  • [x] All commits are signed-off.

Related PRs

Docs link

Summary by Bito

This pull request enhances the Volcano scheduler support by allowing the creation of individual PodGroups for each podTask in Propeller, improving scheduling flexibility with distinct queues and job priorities. A new configuration option ensures backward compatibility, and relevant unit tests have been added to validate and verify the new functionality.

zhuo-zhi avatar May 17 '25 01:05 zhuo-zhi

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

welcome[bot] avatar May 17 '25 01:05 welcome[bot]

Hi reviewers, could you please help me add the added label to this PR? It seems I don’t have the access. Thank you!

zhuo-zhi avatar May 17 '25 01:05 zhuo-zhi

Codecov Report

Attention: Patch coverage is 60.00000% with 32 lines in your changes missing coverage. Please review.

Project coverage is 58.50%. Comparing base (8e0416d) to head (18b4387). Report is 17 commits behind head on master.

Files with missing lines Patch % Lines
...er/pkg/controller/nodes/task/k8s/plugin_manager.go 62.33% 23 Missing and 6 partials :warning:
flyteplugins/go/tasks/plugins/k8s/pod/plugin.go 0.00% 2 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6457      +/-   ##
==========================================
+ Coverage   58.47%   58.50%   +0.02%     
==========================================
  Files         940      940              
  Lines       71584    71658      +74     
==========================================
+ Hits        41861    41920      +59     
- Misses      26540    26551      +11     
- Partials     3183     3187       +4     
Flag Coverage Δ
unittests-datacatalog 59.03% <ø> (ø)
unittests-flyteadmin 56.22% <ø> (-0.02%) :arrow_down:
unittests-flytecopilot 30.99% <ø> (ø)
unittests-flytectl 64.72% <ø> (ø)
unittests-flyteidl 76.12% <ø> (ø)
unittests-flyteplugins 61.06% <0.00%> (+0.11%) :arrow_up:
unittests-flytepropeller 54.81% <62.33%> (+0.02%) :arrow_up:
unittests-flytestdlib 64.04% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar May 17 '25 13:05 codecov[bot]

Without looking to deeply at this I feel that the plugin manager code probably shouldn't be doing exceptional work for a third party Kubernetes scheduler. My intuition is that you would want to create a new plugin for Volcano and configure your Propeller instance to use that as the default plugin.

Sovietaced avatar May 17 '25 13:05 Sovietaced

Without looking to deeply at this I feel that the plugin manager code probably shouldn't be doing exceptional work for a third party Kubernetes scheduler. My intuition is that you would want to create a new plugin for Volcano and configure your Propeller instance to use that as the default plugin.

@Sovietaced Thanks for the review.

All the changes introduced in this PR are gated behind a configuration flag, which is disabled by default. This ensures that the plugin manager will not perform this functionality unless explicitly enabled.

This feature is primarily designed to support the current k8s Pod plugin, and more broadly, any future plugins that launch Pod resources. If I understand correctly, the current code structure assumes each plugin is responsible for managing only one k8s object. This assumption is why a new plugin doesn't work because additional logic is required in the plugin manager to create PodGroups before launching Pods.

For other plugins such as TFJob, PyTorchJob, and MPIJob, their corresponding operators are already integrated with Volcano and can create PodGroups on their own.

zhuo-zhi avatar May 18 '25 05:05 zhuo-zhi

All the changes introduced in this PR are gated behind a configuration flag, which is disabled by default. This ensures that the plugin manager will not perform this functionality unless explicitly enabled.

I understand. My main concern is that core flyte propeller internals shouldn't have references to third party elements like volcano. You're using a feature flag to branch the logic in the generic plugin manager code to do something very specific for volcano.

I will wait for more of the Flyte core maintainers to look at this but I'm thinking it might make more sense to rearchitect Flyte to allow plugins to build multiple resources.

Sovietaced avatar May 18 '25 18:05 Sovietaced

All the changes introduced in this PR are gated behind a configuration flag, which is disabled by default. This ensures that the plugin manager will not perform this functionality unless explicitly enabled.

I understand. My main concern is that core flyte propeller internals shouldn't have references to third party elements like volcano. You're using a feature flag to branch the logic in the generic plugin manager code to do something very specific for volcano.

I will wait for more of the Flyte core maintainers to look at this but I'm thinking it might make more sense to rearchitect Flyte to allow plugins to build multiple resources.

I see your point about making the plugin manager code more generic. I'm not sure how much effort it would take, or whether there are concrete plans to rearchitect the Flyte plugins to support building multiple resources. In the meantime, would it be possible to include this change in an earlier release so users can opt in earlier if needed? It could also help validate the effectiveness of a potential future rearchitecture. More feedback from the Flyte core maintainers would also be greatly appreciated.

zhuo-zhi avatar May 18 '25 23:05 zhuo-zhi

@zhuo-zhi can we add docs here on how to prepare the K8s cluster for this integration?

davidmirror-ops avatar Jun 03 '25 21:06 davidmirror-ops

@zhuo-zhi can we add docs here on how to prepare the K8s cluster for this integration?

@davidmirror-ops Thanks for the review. I've created a separate PR to document the preparation steps: https://github.com/unionai/docs/pull/372

zhuo-zhi avatar Jun 09 '25 00:06 zhuo-zhi

All the changes introduced in this PR are gated behind a configuration flag, which is disabled by default. This ensures that the plugin manager will not perform this functionality unless explicitly enabled.

I understand. My main concern is that core flyte propeller internals shouldn't have references to third party elements like volcano. You're using a feature flag to branch the logic in the generic plugin manager code to do something very specific for volcano.

I will wait for more of the Flyte core maintainers to look at this but I'm thinking it might make more sense to rearchitect Flyte to allow plugins to build multiple resources.

I agree with everything @Sovietaced said, I also feel we cannot have such specialized logic in the plugin manager itself, even with a feature flag :/

fg91 avatar Jun 09 '25 07:06 fg91

All the changes introduced in this PR are gated behind a configuration flag, which is disabled by default. This ensures that the plugin manager will not perform this functionality unless explicitly enabled.

I understand. My main concern is that core flyte propeller internals shouldn't have references to third party elements like volcano. You're using a feature flag to branch the logic in the generic plugin manager code to do something very specific for volcano. I will wait for more of the Flyte core maintainers to look at this but I'm thinking it might make more sense to rearchitect Flyte to allow plugins to build multiple resources.

I agree with everything @Sovietaced said, I also feel we cannot have such specialized logic in the plugin manager itself, even with a feature flag :/

@fg91 @Sovietaced Thanks for the insights. I understand your concern. I’m wondering if there are any actionable steps you’d suggest I take to help move the PR forward?

Rearchitecting Flyte to support plugins that build multiple resources appears to be a significant change, as it would likely require refactoring all existing plugins along with the plugin manager. If that’s the only viable path for integration, would it be possible for me to get some support on this?

zhuo-zhi avatar Jun 11 '25 18:06 zhuo-zhi

@fg91 @Sovietaced Thanks for the insights. I understand your concern. I’m wondering if there are any actionable steps you’d suggest I take to help move the PR forward?

I would defer to the Union folks or people that understand the Propeller architecture deeply.

Sovietaced avatar Jun 12 '25 15:06 Sovietaced

I agree that re-architecting the plugin interface to allow for "extra resources" would be a bigger effort. I don't see a clean way to integrate the creation of resources like PodGroup into Flyte without this though.

The adapted plugin interface could in theory look like this:

type Plugin interface {
	// Defines a func to create a query object (typically just object and type meta portions) that's used to query k8s
	// resources.
	BuildIdentityResource(ctx context.Context, taskCtx pluginsCore.TaskExecutionMetadata) (client.Object, error)

	// Defines a func to create the full resource object that will be posted to k8s.
	BuildResource(ctx context.Context, taskCtx pluginsCore.TaskExecutionContext) (client.Object, error)

	// Analyses the k8s resource and reports the status as TaskPhase. This call is expected to be relatively fast,
	// any operations that might take a long time (limits are configured system-wide) should be offloaded to the
	// background.
	GetTaskPhase(ctx context.Context, pluginContext PluginContext, resource client.Object) (pluginsCore.PhaseInfo, error)

	// Properties desired by the plugin
	GetProperties() PluginProperties

        // New
        // Defines a func to create query objects for the extra resources e.g. for aborting
        BuildExtraIdentityResources(...) ([]client.Object, error)
        
        // Defines a func to create the full extra resource objects that will be posted to k8s.
        BuildExtraResources(...) ([]client.Object, error)
}

Which extra resources would be created would depend on the user configuration for the plugin returned by GetProperties().

Flytepropeller would not do any state monitoring for those extra resources, just create them along the creation of the main task resource. And it would delete them on abort. In theory we could omit the BuildExtraIdentityResources if propeller injected an owner reference to the main task object into the extra resources so they would get cleaned up automatically if the main task object was deleted. That would leave us with the additional BuildExtraResources.

I don't think the existing plugins would create that much work because their respective BuildExtraResources functions would just do nothing initially. But the creation of the resources (and their deletion or injection of owner refs) would have to be implemented in the k8s plugin.

All this is just brainstorming though, something like this would definitely require an RFC (I could help with the process) and it also definitely would need early input from Union folks like @eapolinario @wild-endeavor, @kumare3.


Alternative approach:

If your main goal was not officially integrating Flyte with Volcano but you/your organization being able to use Flyte pod tasks with Volcano, I think a much lower hanging fruit (albeit not an official integration into flyte) would be to handle the PodGroup creation in a (mutating) admission webhook that gets triggered when flytepropeller creates the pods. You could deploy such a service along side Flyte. I've done something like this in the past for other resources.

fg91 avatar Jun 12 '25 16:06 fg91

Cc @troychiu this was the one

kumare3 avatar Jun 17 '25 22:06 kumare3