pipelines
pipelines copied to clipboard
chore(sdk): check that requested memory less than or equals according limits
Description of your changes: raise an error with a description if the requested resources (memory/CPU) exceed the selected limit while defining the pipeline.
examples:
from kfp.dsl import pipeline_task
task = pipeline_task.PipelineTask(...).set_memory_request('2M').set_memory_limit('1M')
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign chensun for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Hi @ntny. Thanks for your PR.
I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/ok-to-test /rerun-all
@ntny Please take a look at https://github.com/kubeflow/pipelines/blob/master/sdk/CONTRIBUTING.md#code-style to fix the failling YAPF tests.
Looks like there are sdk ci failures, @ntny can you take a look?
Looks like there are sdk ci failures, @ntny can you take a look?
Sure, in a few days.
/rerun-all
/rerun-all
/rerun-all
For validation purposes, I had to restore the code fragment for converting requested resource memory labels to float (for comparing limits and requests).
@HumairAK Latest CI Check failed due to timeout. I suppose it might be unrelated to MR
/retest
Thanks for the PR.
This kind of thing -- baking Kubernetes logic into KFP -- makes me a bit nervous. I already don't like that the kfp-kubernetes SDK basically "chases" the official python client, and the slope here seems slippery.
Using a different example, today we allow mounting a secret as a volume. Is it KFP's responsibility to make sure that the secret name is a valid Kubernetes name, or should we just rely on the Kubernetes server to deal with that? Going further, should we be pre-checking that the secret exists, or should we just let Kubernetes tell us that the volume mount failed at pipeline run time?
I've been mostly thinking we should just leave these sorts of things up to Kubernetes, and treat kfp-kubernetes as basically a "dumb" yaml generator.
This change seems fairly innocuous, but it's more code to maintain, and we're on the hook if the Kubernetes behavior changes someday.
I'd be a little more receptive to the change if we could somehow use the official python client. Did you by any chance look into that?
What do others think?
Thanks for the PR.
This kind of thing -- baking Kubernetes logic into KFP -- makes me a bit nervous. I already don't like that the kfp-kubernetes SDK basically "chases" the official python client, and the slope here seems slippery.
Using a different example, today we allow mounting a secret as a volume. Is it KFP's responsibility to make sure that the secret name is a valid Kubernetes name, or should we just rely on the Kubernetes server to deal with that? Going further, should we be pre-checking that the secret exists, or should we just let Kubernetes tell us that the volume mount failed at pipeline run time?
I've been mostly thinking we should just leave these sorts of things up to Kubernetes, and treat kfp-kubernetes as basically a "dumb" yaml generator.
This change seems fairly innocuous, but it's more code to maintain, and we're on the hook if the Kubernetes behavior changes someday.
I'd be a little more receptive to the change if we could somehow use the official python client. Did you by any chance look into that?
What do others think?
Thanks, i agree with your points. My reasoning for this PR:
- pipeline_task already performs similar memory checks on input parameters and it is also k8s specific validation https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/pipeline_task.py#L457 . I am not sure my check adds anything different.
- Requesting resources beyond the limits looks abnormal, and this behavior isn’t specific to Kubernetes in my view. It seems to me that any API providing request/limit options should handle this validation.
But overall, I agree with your arguments about maintaining the code, and it’s up to you to decide here. Regarding the Python-K8s client, thanks, I will take a look at it
@chensun do you have any thoughts on the above two comments? I'm not sure how much "Kubernetes logic" (for lack of a better term) we should be baking into KFP. Anton has some good points.
Hi @gregsheremeta! Regarding the use of the official Python client for validation at the KFP sdk level, I might be mistaken, but in my view, it is not suitable here. This is because sdk generates its own specification (or worked directly with Argo in previous versions) and does not operate within the Kubernetes context. The official client, on the other hand, works with Kubernetes entities. Plus official Python client supports a dry-run pattern as an alternative for validation. I'm not sure if this would be useful for the entire pipeline, but it should be implemented in the KFP driver for my account rather than in the SDK, as a consequence, validation will occur at a different stage of the run..
I've been mostly thinking we should just leave these sorts of things up to Kubernetes, and treat kfp-kubernetes as basically a "dumb" yaml generator.
I personally lean towards keeping SDK as a "dumb" yaml generator, and not baking too much logic into it. Otherwise, SDK might become a bottleneck, and we would need to keep chasing trivial changes from Kubernetes side.
pipeline_task already performs similar memory checks on input parameters and it is also k8s specific validation https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/pipeline_task.py#L457 . I am not sure my check adds anything different.
This check is something we inherited from KFP v1 codebase, it's probably much easier to keep it as-is then reasoning the other way around. While I agree that client side validation is nice to have and catching error earlier is often desirable UX, I do want to point out that such check has limited usage and the potential downside could be inconsistent UX. For input parameter validation, SDK can only check when the value is available at compilation or submission time, which is not always the case. For example,
some_component().set_memory_limit('8G') # SDK validation can check and throw early error on incorrect format
upstream_task = upstream()
downstream().set_memory_limit(upstream_task.output) # SDK validation cannot do anything with the runtime value
I personally prefer having consistent behavior and error message on both cases.
I think this was accidentally reopened? Based on the discussion here and in the community call a few weeks ago, we have decided to keep this level of validation for k8s resources out of the sdk.