Drop all custom docker image requirements
Current
dstack allows running custom Docker images by specifying them in the image property. However, not all images can be used. These are some of the image requirements:
- The software in the image should allow running as root
- The image should have either
apt-getoryum - The image should have
/bin/sh - etc.
Proposed
Drop all image requirements and support all valid Docker images, including images built FROM scratch.
Implementation notes
The main source of requirements seems to be the installation and configuration of the OpenSSH server. Possible solutions to dropping the requirements related to the OpenSSH server include:
- Shipping a statically-linked OpenSSH server binary that would allow running without root privileges and would not need a package manager for installation.
- Using an alternative SSH server implementation in Go, so that the server could be part of the
dstack-runnerbinary.
This issue is stale because it has been open for 30 days with no activity.
Some examples of images that don't work and their respective errors:
nvcr.io/nim/meta/llama3-8b-instruct:latest(or any other images with a non-root user) when run on RunPod or Vast.ai - never starts, killed by provisioning timeoutprom/prometheus-Error: Distribution not supportedfedora-sed: can't read /root/.profile: No such file or directorygcr.io/etcd-development/etcd:v3.4.34-exec: "/bin/sh": stat /bin/sh: no such file or directory: unknownbitnami/thanos-unable to find user root: no matching entries in passwd file
Action Plan (WIP)
Completing the plan would allow to (at least):
- run non-root images on backends where we cannot override the container user (e.g., NIM on RunPod);
- run non-
deb/rpm-based images.
Keep the default image user
- [x] Get
USERfrom the Docker image, as it's already done forENTRYPOINTandCMD(seeJobConfigurator), store it asJobSpec.user. - [x] (Optional) Add a new
userproperty to the run configurations to override the default image user. - [ ] (Optional) If the
userproperty is set and not equal to the default image user, exclude offers from backends where we cannot override the container user (RunPod, Vast.ai). - [x] Start the container as
root(if possible) to ensure that both the runner and the SSH server have sufficient permissions. - [x] [runner] Execute the job with
Cmd.SysProcAttr.Credential.{Uid,Gid}set according to theJobSpec.user. - [x] [runner] Put SSH public keys into both
USER/user's androot's~/.ssh/authorized_keys. - [ ] [CLI] Use
USER/userinstead ofrootin the~/.dstack/ssh/config(ssh run_name→ log in as a default/overridden user,ssh root@run_name→ log in as root) if it's possible to log in asuser(the user has a home dir and a proper login shell, notnologin/false).
Download the runner
- [x] On shim-enabled instances, download the runner (and the SSH server if OpenSSH is used) with shim.
- [ ] On backends without shim, try all possible tools (GNU Wget, Busybox Wget, cURL,
urllib.urlopen, etc.) to download the runner/SSH server and fail if none available.
Bring our own SSH server
Statically linked OpenSSH or Dropbear or crypto/ssh-based Golang implementation embedded into the runner — yet to be decided.
- [ ] [runner] If the runner is started by
root, configurerootSSH access. In addition, ifJobSpec.user!=root, configure non-root SSH access. In any case,JobSpec.useris the default SSH user (that is,JobSpec.useris theUserin the SSH client config generated bydstackclient).
(Optional) Images without *nix userland
- [ ] [runner] Bring our own shell and tools (BusyBox).
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.