elyra
elyra copied to clipboard
Add support for Kubernetes tolerations
This PR adds support for Kubernetes tolerations for Kubeflow Pipelines and Apache Airflow runtime environments.
Deviations from the specification:
-
tolerationSeconds
is not supported (as requested in https://github.com/elyra-ai/elyra/issues/2823) - but support can be added in the future - 'should' constraints are implemented as 'must' constraints
Tolerations are supported for generic components and custom components and can be defined as pipeline defaults or for individual nodes.
Pipeline default settings in VPE
- Displayed in section "Node defaults":
Generic node setting in VPE
Custom node setting in VPE
- Displayed in section "Additional properties"
Applied tolerations (KFP)
Confirmation that the tolerations are applied on the pod
- Top: output of
kubectl describe pod lambda-...
- Bottom: Pod summary in Kubeflow central dashboard
Notes:
-
Status (if checked, functionality is implemented and was verified):
- [x] Pipeline editor: validation
- [x] KFP generic node property
- [x] KFP custom node property
- [x] KFP pipeline default
- [x] Airflow generic node property
- [x] Airflow custom node property
- [x] Airflow pipeline default
-
[Requires discussion] The current UI support for list-based properties requires input in the form
SOME_KEY=SOME_VALUE
. The current implementation assumes input as<T_ID>=<key>:<op>:<value>:<effect>
, where<T_ID>
is an arbitrary identifier. -
Proper UI support requires https://github.com/elyra-ai/elyra/pull/2780
-
Closes #2681
-
Closes #2823
What changes were proposed in this pull request?
- Added new optional 'Kubernetes tolerations' pipeline default property
- Added new optional 'Kubernetes tolerations' node property
- Updated documentation
How was this pull request tested?
- Added new validation tests
- Manual testing (see scope and status in previous section)
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the Apache License 2.0; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
[Requires discussion] The current UI support for list-based properties requires input in the form SOME_KEY=SOME_VALUE. Of the inputs (key, operator, value, and effect) only operator is required, which in essence means that (while this is a work in progress) input would need to be specified as
= : : .
I think the key
value must be used in the key position otherwise, you could only apply two taints max (one for Exists
and one for Equal
) if using operator
since the set of key/values maps to a dictionary. Since the empty key
indicates that the taint applies to all keys, perhaps we could let a key
value of '*'
indicate this behavior and keep key
in the key position.
Re-confirmed* that after the merge the following scenarios yield the expected results
- [x] run pipeline with generic component (KFP)
- [x] run pipeline with generic component (Airflow)
- [x] export pipeline with generic component (KFP)
- [x] export pipeline with generic component (Airflow)
- [x] run pipeline with custom component (KFP)
- [x] run pipeline with custom component (Airflow)
- [x] export pipeline with custom component (KFP)
- [x] export pipeline with custom component (Airflow)
(*) by inspecting
- exported DAG
- checking the output of
kubectl describe pod ...
- task instance details properties (Airflow)
- pod log in Central dashboard (KFP)