pai
pai copied to clipboard
Add alert for GPU perf issue
We already has alert for this issue, but not cover all situations.
- [ ] Add case: GPU perf in P0 status, but application clock not correct
- [ ] Add auto-fix tools. When detected this issue, we can launch a privileged pod and run command to fix this.