LX

Results 36 issues of LX

fix: #2492 4 steps: 1. modify `~/.ssh/config` file 2. open port for container 3. add label for pod 4. add `podAntiAffinity` for pods that use `hostNetwork`

size/L
priority/high
lifecycle/stale

### Describe the feature add [code-server](https://github.com/coder/code-server) support ### Why do you need this feature? Although the remote plugin for vscode is now supported, sometimes it is more convenient to use...

type/feature 💡

**Is your feature request related to a problem? Please describe.** As far as I know, the current distributed optimizer of megatron-lm implements zero1, but zero1 does not save enough GPU...

stale

https://kubernetes.io/docs/reference/scheduling/policies/ the `policy-config-file` is removed in kubernetes 1.23

**Is your feature request related to a problem? Please describe.** I use dvorak keyboard layout, and I can not find a way to change the wslg keyboard from qwerty to...

enhancement
keyboard-layout

Reservation plugin has been deleted. This doc is no longer needed. It will confuse users. Additionally, I recommend cleaning up documents that have expired.

size/L
retest-not-required-docs-only

#### What would you like to be added: When using the job policy to retry a failed task, retry it in place instead of retrying after releasing the resources. ####...

kind/feature

**Is your feature request related to a problem? Please describe.** I noticed that the current black version of megatron-lm is very old and already behind NeMo https://github.com/NVIDIA/NeMo/blob/b0f3138a6be7fab3175deb8935f8492aeb1445bd/pyproject.toml#L33 **Describe the solution...

stale

### What is the problem you're trying to solve Currently there is no check when creating jobflow and jobtemplate, this may cause the controller or scheduler to panic. ### Describe...

kind/feature