Knative options to support for shared GPU resources with PSP
/area API /kind spec
Ask your question here:
In our use cases enabling GPU resource sharing would add huge value for running serverless GPU workloads as many of the workloads only occupies fraction of the GPU.
Currently there have been some solutions that enables kubernetes to virtualize partial GPU such as Nvidia MPS. However, Nvidia MPS would require hostIPC and hostPID to function properly.
However, PSP configurations such as enabling hostIPC is not supported within the Knative serving spec. They seems not inbluded in the extensions either. From this discussion it seems like adding PSP support would not be considered as they are in deprecation path?
We would like to know if Knative has any plan or solutions to support enabling hostIPC or hostPID features for pod or alternatives that achieves the same effect. Thanks.
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.
Hey anyone addressed this? I also need to enable hostIPC for kserve usage with vgpu.
Couple things to note here: with the move to Pod Security Standards in Kubernetes v1.25+, sharing host namespaces would be disallowed under non-"privileged" policies. It could also run into issues if we were to enable user namespaces in the future.
That said, it's probably worth discussing whether this is something to consider enabling (behind a feature gate). cc @dprotaso and @evankanderson for any API or security concerns.
/remove-lifecycle stale
This might be a reasonable Knative extension; it probably doesn't belong in the spec, as it may not be portable across all spec implementations.
If you want to add this, I'd put the validation relaxation behind an existing or new feature flag, particularly since it sounds like the implementation may still be in flux.
Thanks for reopening this. I need this to be able to use vgpus in kserve, as it uses knative. Thanks again.
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.
This issue or pull request is stale because it has been open for 90 days with no activity.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close
/lifecycle stale
@pablocael hi still interested in re-opening your PR or should we close this issue?
@pablocael hi still interested in re-opening your PR or should we close this issue?
You mean this PR https://github.com/knative/serving/pull/13310?
I have no current interesting on persuing this PR right now. Thanks!
We have a few clients that have a business need for hostIPC. As previously mentioned by @pablocael, on Sept. 16, 2022,
I need this to be able to use vgpus in kserve, as it uses knative.
As shown in the attached KsvcReconciler log, this field gets overwritten regardless of the method or manner of deployment. Has there been any update on this front? ksvc-reconciler.json