serving icon indicating copy to clipboard operation
serving copied to clipboard

Knative options to support for shared GPU resources with PSP

Open YingjingLu opened this issue 3 years ago • 10 comments

/area API /kind spec

Ask your question here:

In our use cases enabling GPU resource sharing would add huge value for running serverless GPU workloads as many of the workloads only occupies fraction of the GPU.

Currently there have been some solutions that enables kubernetes to virtualize partial GPU such as Nvidia MPS. However, Nvidia MPS would require hostIPC and hostPID to function properly.

However, PSP configurations such as enabling hostIPC is not supported within the Knative serving spec. They seems not inbluded in the extensions either. From this discussion it seems like adding PSP support would not be considered as they are in deprecation path?

We would like to know if Knative has any plan or solutions to support enabling hostIPC or hostPID features for pod or alternatives that achieves the same effect. Thanks.

YingjingLu avatar Apr 07 '22 21:04 YingjingLu

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Jul 07 '22 01:07 github-actions[bot]

Hey anyone addressed this? I also need to enable hostIPC for kserve usage with vgpu.

pablocael avatar Sep 15 '22 02:09 pablocael

Couple things to note here: with the move to Pod Security Standards in Kubernetes v1.25+, sharing host namespaces would be disallowed under non-"privileged" policies. It could also run into issues if we were to enable user namespaces in the future.

That said, it's probably worth discussing whether this is something to consider enabling (behind a feature gate). cc @dprotaso and @evankanderson for any API or security concerns.

psschwei avatar Sep 15 '22 14:09 psschwei

/remove-lifecycle stale

This might be a reasonable Knative extension; it probably doesn't belong in the spec, as it may not be portable across all spec implementations.

If you want to add this, I'd put the validation relaxation behind an existing or new feature flag, particularly since it sounds like the implementation may still be in flux.

evankanderson avatar Sep 15 '22 14:09 evankanderson

Thanks for reopening this. I need this to be able to use vgpus in kserve, as it uses knative. Thanks again.

pablocael avatar Sep 17 '22 02:09 pablocael

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Dec 17 '22 01:12 github-actions[bot]

This issue or pull request is stale because it has been open for 90 days with no activity.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close

/lifecycle stale

knative-prow-robot avatar Jan 16 '23 01:01 knative-prow-robot

@pablocael hi still interested in re-opening your PR or should we close this issue?

skonto avatar Dec 01 '23 23:12 skonto

@pablocael hi still interested in re-opening your PR or should we close this issue?

You mean this PR https://github.com/knative/serving/pull/13310?

I have no current interesting on persuing this PR right now. Thanks!

pablocael avatar Feb 27 '24 06:02 pablocael

We have a few clients that have a business need for hostIPC. As previously mentioned by @pablocael, on Sept. 16, 2022,

I need this to be able to use vgpus in kserve, as it uses knative.

As shown in the attached KsvcReconciler log, this field gets overwritten regardless of the method or manner of deployment. Has there been any update on this front? ksvc-reconciler.json

pdred avatar Apr 23 '24 20:04 pdred