Zhiyuan He

Results 12 issues of Zhiyuan He

## Motivation 1. Current test cases are mainly located in https://github.com/microsoft/hivedscheduler/blob/v0.3.4/pkg/algorithm/hived_algorithm_test.go . There are a lot of global variables and the code uses a lot of functions to reference them,...

Deployment in China will face some network-related issues. @siaimes has a good solution on this. Please use this issue to discuss this topic.

deployment

Sometimes k8s network plugin causes some problems in OpenPAI. Users can follow the steps below to solve the problem: 1. remove CNI in Kubernetes Detailed steps are provided here :...

new schema in OpenPAI job protocol: ```yaml extras: gangAllocation: true/false hivedscheduler: jobPriorityClass: crit | prod | test (default) | oppo taskRoles: : pinnedCellId: string | null (default) # if resourcePerInstance...

## Motivation When PAI is deployed on cloud, admins may want to stop some free nodes to save money. When a new job is submitted, the closed nodes can be...

1. `systemd-resolved` sets `nameserver 127.0.0.53` in `/etc/resolv.conf` 2. This will cause `dns loop` in coredns pod, which will result in CrashLoopBackOff. TODO: add a check in `requirement.sh` to check this...

known issue

## Motivation #5145 has extended the prerequisite field. But users can only use and share prerequisites in job yaml. We can support UI for prerequistes, especially for data prerequisite. This...

If there is a lot of waiting jobs (e.g. 30000+ frameworks), and framework watcher does a re-listing, the memory usage will soon rise up to >1000 MB+.

pai-dev

Currently internal storage is used as a service in OpenPAI. We can make it an init container. Using init container has the following advantages: - Different services can leverage the...