incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[Umbrella] Better K8S operator support

Open advancedxy opened this issue 2 years ago • 1 comments

Code of Conduct

Search before asking

  • [X] I have searched in the issues and found no similar issues.

Describe the proposal

To support deployment on K8S natively and smoothly, we may have to add the following support:

  1. expose more fields in operator's CRD, such as RuntimeClassName, Tolerations, Annotation and Affinity, etc. Therefore the shuffle server cloud be deployed more flexible
  2. LogHostPath and HostPathMounts may be refactored to be supplied by container runtime. As shuffle server may be deployed on mixed nodes, the HostPathMounts can be different on different hosts.
  3. Add an cli binary to hide details of RSS operations: rolling upgrade, restart, fully upgrade and gray version etc.
  4. vpc template support
  5. service and network refinement:
    • shuffle server is a network traffic heavy application, it's not wise to use service to proxy external client's read/write request to shuffle server
    • coordinators' deployment may need some refine, in current arch, the replicate of coordinator can only one 1. Otherwise, there would be a brain split problem.
  6. various bug fixes, such as init-containers resource request/limit.

Task list

  • [x] add more fileds in CRD, such as #469 #545
  • [x] #288
  • [x] #289
  • [x] #496
  • [x] #522
  • [x] #524

Are you willing to submit PR?

  • [x] Yes I am willing to submit a PR!

advancedxy avatar Jan 09 '23 14:01 advancedxy

cc @wangao1236

advancedxy avatar Jan 09 '23 14:01 advancedxy