incubator-uniffle
incubator-uniffle copied to clipboard
[Umbrella] Better K8S operator support
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Search before asking
- [X] I have searched in the issues and found no similar issues.
Describe the proposal
To support deployment on K8S natively and smoothly, we may have to add the following support:
- expose more fields in operator's CRD, such as
RuntimeClassName,Tolerations,AnnotationandAffinity, etc. Therefore the shuffle server cloud be deployed more flexible - LogHostPath and HostPathMounts may be refactored to be supplied by container runtime. As shuffle server may be deployed on mixed nodes, the
HostPathMountscan be different on different hosts. - Add an cli binary to hide details of RSS operations: rolling upgrade, restart, fully upgrade and gray version etc.
- vpc template support
- service and network refinement:
- shuffle server is a network traffic heavy application, it's not wise to use service to proxy external client's read/write request to shuffle server
- coordinators' deployment may need some refine, in current arch, the replicate of coordinator can only one 1. Otherwise, there would be a brain split problem.
- various bug fixes, such as init-containers resource request/limit.
Task list
- [x] add more fileds in CRD, such as #469 #545
- [x] #288
- [x] #289
- [x] #496
- [x] #522
- [x] #524
Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
cc @wangao1236