python icon indicating copy to clipboard operation
python copied to clipboard

`V1PodFailurePolicyRule` does not work without `on_pod_conditions`

Open fproske opened this issue 2 years ago • 6 comments

What happened (please include outputs or screenshots): I tried implementing the first example of the Kubernetes documentation (https://kubernetes.io/docs/tasks/job/pod-failure-policy/) for pod failure policies:

podFailurePolicy:
    rules:
    - action: FailJob
      onExitCodes:
        containerName: main
        operator: In
        values: [42]

I implemented it the following way:

spec = client.V1JobSpec(
    ttl_seconds_after_finished=100,
    template=template,
    backoff_limit=100,
    pod_failure_policy=client.V1PodFailurePolicy(
        rules=[client.V1PodFailurePolicyRule(
            action='FailJob',
            on_exit_codes=client.V1PodFailurePolicyOnExitCodesRequirement(
                operator='In',
                values=[42]
            ))]
    ))

However, I always get an exception:

  File "/root/job_spawner/run.py", line 236, in _create_job_object
    rules=[client.V1PodFailurePolicyRule(
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/models/v1_pod_failure_policy_rule.py", line 61, in __init__
    self.on_pod_conditions = on_pod_conditions
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/models/v1_pod_failure_policy_rule.py", line 130, in on_pod_conditions
    raise ValueError("Invalid value for `on_pod_conditions`, must not be `None`")  # noqa: E501
ValueError: Invalid value for `on_pod_conditions`, must not be `None`

I tried working around the issue by providing an empty list to on_pod_conditions, but then the exception occurs when deserializing the model:

  File "/root/job_spawner/run.py", line 260, in _create_job
    api_response = api_instance.create_namespaced_job(
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api/batch_v1_api.py", line 210, in create_namespaced_job
    return self.create_namespaced_job_with_http_info(namespace, body, **kwargs)  # noqa: E501
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api/batch_v1_api.py", line 309, in create_namespaced_job_with_http_info
    return self.api_client.call_api(
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 192, in __call_api
    return_data = self.deserialize(response_data, response_type)
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 264, in deserialize
    return self.__deserialize(data, response_type)
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 303, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 639, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 303, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 639, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 303, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 639, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 280, in __deserialize
    return [self.__deserialize(sub_data, sub_kls)
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 280, in <listcomp>
    return [self.__deserialize(sub_data, sub_kls)
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 303, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 641, in __deserialize_model
    instance = klass(**kwargs)
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/models/v1_pod_failure_policy_rule.py", line 61, in __init__
    self.on_pod_conditions = on_pod_conditions
  File "/root/.local/lib/python3.10/site-packages/kubernetes/client/models/v1_pod_failure_policy_rule.py", line 130, in on_pod_conditions
    raise ValueError("Invalid value for `on_pod_conditions`, must not be `None`")  # noqa: E501
ValueError: Invalid value for `on_pod_conditions`, must not be `None`

What you expected to happen: I expect to be able to implement the example from the Kubernetes documentation. How to reproduce it (as minimally and precisely as possible):

from kubernetes import client
client.V1PodFailurePolicyRule(
    action='FailJob',
    on_exit_codes=client.V1PodFailurePolicyOnExitCodesRequirement(
        operator='In',
        values=[42]
    ))

Environment:

  • Kubernetes version (kubectl version): v1.26.3
  • OS (e.g., MacOS 10.13.6): MacOS 13.4
  • Python version (python --version) 3.10.3
  • Python client version (pip list | grep kubernetes) 26.1.0

fproske avatar May 16 '23 10:05 fproske

Deserialization Exception on Job

User I tried working around the issue by providing an empty list to on_pod_conditions, but then the exception occurs when deserializing the model:

File "/root/job_spawner/run.py", line 260, in _create_job api_response = api_instance.create_namespaced_job( File "/root/.local/lib/python3.10/site-packages/kubernetes/client/api/batch_v1_api.py", line 210, in create_namespaced_job return self.create_namespaced_job_with_ht

Review the error message: The error message and the lines of code referenced in the stack trace can provide valuable information about the cause of the exception. Look for any specific error messages or error codes that might give you a clue about what went wrong.

Check the API endpoint and credentials: Ensure that you are using the correct API endpoint and that the credentials you provided have sufficient permissions to create a job in the specified namespace. Double-check the values you passed to the Kubernetes client library to make sure they are accurate.

Verify the model serialization and deserialization process: If the exception occurs during the deserialization of the model, make sure that the model was properly serialized before being passed to the Kubernetes client library. Check that the serialization and deserialization processes are correctly implemented and that the model object is intact when it reaches the point of deserialization.

Examine the model code: Look for any custom code related to the model that might be causing the exception. Check for any issues with the dependencies, serialization, or deserialization of the model. It's also possible that the model itself has some internal inconsistencies or requirements that are not being met.

Consult the Kubernetes client library documentation: Refer to the documentation of the Kubernetes client library you are using to see if there are any specific requirements or considerations when creating a job. Look for any troubleshooting guides or known issues that might be relevant to your situation.

Raaja0007 avatar May 23 '23 20:05 Raaja0007

Could you check the Kubernetes API reference to see if the field on_pod_conditions is a required field?

roycaihw avatar May 24 '23 17:05 roycaihw

I can't find the part in the API documentation that mentions this, but I'm sure it is not required. Actually, I even tried providing bothon_exit_codes and on_pod_conditions and got an API error. I worked around this issue for now with this (ugly) monkey patch:

class V1FixedPodFailurePolicyRule(kubernetes.client.V1PodFailurePolicyRule):
    @property
    def on_pod_conditions(self):
        return self._on_pod_conditions

    @on_pod_conditions.setter
    def on_pod_conditions(self, on_pod_conditions):
        """Sets the on_pod_conditions of this V1PodFailurePolicyRule.

        Represents the requirement on the pod conditions. The requirement is represented as a list of pod condition patterns. The requirement is satisfied if at least one pattern matches an actual pod condition. At most 20 elements are allowed.  # noqa: E501

        :param on_pod_conditions: The on_pod_conditions of this V1PodFailurePolicyRule.  # noqa: E501
        :type: list[V1PodFailurePolicyOnPodConditionsPattern]
        """
        self._on_pod_conditions = on_pod_conditions

kubernetes.client.V1PodFailurePolicyRule = V1FixedPodFailurePolicyRule
kubernetes.client.models.V1PodFailurePolicyRule = V1FixedPodFailurePolicyRule

That seems to do the trick.

fproske avatar May 26 '23 12:05 fproske

This issue seems to have generated from https://github.com/kubernetes/api/issues/48.

onPodConditions is a required field in the k8s OpenAPI reference but it should't, since only one of onExitCodes and onPodConditions should be given (and they actually behave in this way). (Checkout https://raw.githubusercontent.com/kubernetes-client/python/master/scripts/swagger.json)

Since all codes in this package are autogenerated from the openapi-generator, maybe monkey patching as @fproske proposed would be an alternative until the OpenAPI spec changes.

xxnpark avatar Jul 25 '23 16:07 xxnpark

Guess k8s 1.28.2 fixed the underlying issue (https://github.com/kubernetes/kubernetes/pull/120208). Just need to wait for new client now (probably for k8s 1.29)

fproske avatar Oct 13 '23 14:10 fproske

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 30 '24 07:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 29 '24 07:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 30 '24 07:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 30 '24 07:03 k8s-ci-robot