pai Default k8s scheduler support

Organization Name: Advantech

Short summary about the issue/question: Does the default k8s scheduler support in opnepai v1.5.0? How to run CPU and GPU tasks on single GPU worker?(E.g. https://github.com/microsoft/pai/issues/5044)

Brief what process you are following: In v1.0.1, we can use the k8s default scheduler based on https://github.com/microsoft/pai/issues/5044#issuecomment-720410187. When I change the k8s default scheduler, the SKU based scheduling seems incorrect. Does the default k8s scheduler support in opnepai v1.5.0 or in the future release?

How to reproduce it:

Deploy openpai v1.5.0
Change the scheduler as follows

hivedscheduler:
  config: |

Apply the configuration ./paictl.py service stop -n rest-server hivedscheduler ./paictl.py config push -p -m service ./paictl.py service start -n hivedscheduler rest-server
The SKU on the job submission seems incorrect

OpenPAI Environment:

OpenPAI version: v1.5.0
OS (e.g. from /etc/os-release): Ubuntu 18.04.3 LTS

Apr 08 '21 03:04 JosephKang

Not sure if we still support default scheduler. @abuccts to help

Apr 08 '21 07:04 yqwang-ms

@JosephKang , could you describe the detailed scheduling behavior? We didn't test the default scheduler for a while.

Apr 09 '21 07:04 fanyangCS

The default scheduler is used to set the job resource on demand instead of SKU unit allocation, and it might be achieve the maximum utilization of the worker node.

The following scenarios might be a good example for one worker with 1GPU/9 CPU resource. Please let me know if my understanding is incorrect Scenario a. One 1GPU/4CPU task and one 4CPU task at the same time Scenario b. Two 4 CPU tasks at the same time.

Apr 11 '21 15:04 JosephKang

It seems you are asking if webportal is allowed to assign a fraction of resource other than the defined SKU? And yes, we prefer users to use the resource in the granularity of sku to avoid unnecessary fragmentation (so in webportal you cannot set resource other than sku). If you want more fine-grained resource usage, you can specify the resource usage through OpenPAI SDK.

Apr 12 '21 06:04 fanyangCS

We hope to have more fine-grained resource usage. It seems that the assign task pod can be set based on API parameters instead of SKU unit, but the available resource deduction seems to be based on the granularity of SKU.

E.g.
Total resource = 2GPU, 8CPU and 50G RAM
 SKU                = 1GPU/4CPU/25G RAM,
Request API    =           2CPU/ 20G RAM
Reminding available = 1GPU/4CPU/25G RAM (1 SKU left)

Is it also a preferred behavior in order to sync sku?

Apr 14 '21 06:04 JosephKang

pai pai copied to clipboard

Default k8s scheduler support

pai
pai copied to clipboard