LX
LX
fix: #2492 4 steps: 1. modify `~/.ssh/config` file 2. open port for container 3. add label for pod 4. add `podAntiAffinity` for pods that use `hostNetwork`
### Describe the feature add [code-server](https://github.com/coder/code-server) support ### Why do you need this feature? Although the remote plugin for vscode is now supported, sometimes it is more convenient to use...
**Is your feature request related to a problem? Please describe.** As far as I know, the current distributed optimizer of megatron-lm implements zero1, but zero1 does not save enough GPU...
https://kubernetes.io/docs/reference/scheduling/policies/ the `policy-config-file` is removed in kubernetes 1.23
**Is your feature request related to a problem? Please describe.** I use dvorak keyboard layout, and I can not find a way to change the wslg keyboard from qwerty to...
Reservation plugin has been deleted. This doc is no longer needed. It will confuse users. Additionally, I recommend cleaning up documents that have expired.
#### What would you like to be added: When using the job policy to retry a failed task, retry it in place instead of retrying after releasing the resources. ####...
**Is your feature request related to a problem? Please describe.** I noticed that the current black version of megatron-lm is very old and already behind NeMo https://github.com/NVIDIA/NeMo/blob/b0f3138a6be7fab3175deb8935f8492aeb1445bd/pyproject.toml#L33 **Describe the solution...
### What is the problem you're trying to solve Currently there is no check when creating jobflow and jobtemplate, this may cause the controller or scheduler to panic. ### Describe...