Zhiyuan He
Zhiyuan He
## Motivation 1. Current test cases are mainly located in https://github.com/microsoft/hivedscheduler/blob/v0.3.4/pkg/algorithm/hived_algorithm_test.go . There are a lot of global variables and the code uses a lot of functions to reference them,...
Deployment in China will face some network-related issues. @siaimes has a good solution on this. Please use this issue to discuss this topic.
Sometimes k8s network plugin causes some problems in OpenPAI. Users can follow the steps below to solve the problem: 1. remove CNI in Kubernetes Detailed steps are provided here :...
to resolve #5419
new schema in OpenPAI job protocol: ```yaml extras: gangAllocation: true/false hivedscheduler: jobPriorityClass: crit | prod | test (default) | oppo taskRoles: : pinnedCellId: string | null (default) # if resourcePerInstance...
## Motivation When PAI is deployed on cloud, admins may want to stop some free nodes to save money. When a new job is submitted, the closed nodes can be...
1. `systemd-resolved` sets `nameserver 127.0.0.53` in `/etc/resolv.conf` 2. This will cause `dns loop` in coredns pod, which will result in CrashLoopBackOff. TODO: add a check in `requirement.sh` to check this...
## Motivation #5145 has extended the prerequisite field. But users can only use and share prerequisites in job yaml. We can support UI for prerequistes, especially for data prerequisite. This...
If there is a lot of waiting jobs (e.g. 30000+ frameworks), and framework watcher does a re-listing, the memory usage will soon rise up to >1000 MB+.
Currently internal storage is used as a service in OpenPAI. We can make it an init container. Using init container has the following advantages: - Different services can leverage the...