dstack icon indicating copy to clipboard operation
dstack copied to clipboard

Drop all custom docker image requirements

Open jvstme opened this issue 1 year ago • 3 comments

Current

dstack allows running custom Docker images by specifying them in the image property. However, not all images can be used. These are some of the image requirements:

  • The software in the image should allow running as root
  • The image should have either apt-get or yum
  • The image should have /bin/sh
  • etc.

Proposed

Drop all image requirements and support all valid Docker images, including images built FROM scratch.

Implementation notes

The main source of requirements seems to be the installation and configuration of the OpenSSH server. Possible solutions to dropping the requirements related to the OpenSSH server include:

  • Shipping a statically-linked OpenSSH server binary that would allow running without root privileges and would not need a package manager for installation.
  • Using an alternative SSH server implementation in Go, so that the server could be part of the dstack-runner binary.

jvstme avatar Aug 11 '24 21:08 jvstme

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Sep 11 '24 01:09 github-actions[bot]

Some examples of images that don't work and their respective errors:

  • nvcr.io/nim/meta/llama3-8b-instruct:latest (or any other images with a non-root user) when run on RunPod or Vast.ai - never starts, killed by provisioning timeout
  • prom/prometheus - Error: Distribution not supported
  • fedora - sed: can't read /root/.profile: No such file or directory
  • gcr.io/etcd-development/etcd:v3.4.34 - exec: "/bin/sh": stat /bin/sh: no such file or directory: unknown
  • bitnami/thanos - unable to find user root: no matching entries in passwd file

jvstme avatar Oct 22 '24 10:10 jvstme

Action Plan (WIP)

Completing the plan would allow to (at least):

  • run non-root images on backends where we cannot override the container user (e.g., NIM on RunPod);
  • run non-deb/rpm-based images.

Keep the default image user

  • [x] Get USER from the Docker image, as it's already done for ENTRYPOINT and CMD (see JobConfigurator), store it as JobSpec.user.
  • [x] (Optional) Add a new user property to the run configurations to override the default image user.
  • [ ] (Optional) If the user property is set and not equal to the default image user, exclude offers from backends where we cannot override the container user (RunPod, Vast.ai).
  • [x] Start the container as root (if possible) to ensure that both the runner and the SSH server have sufficient permissions.
  • [x] [runner] Execute the job with Cmd.SysProcAttr.Credential.{Uid,Gid} set according to the JobSpec.user.
  • [x] [runner] Put SSH public keys into both USER/user's and root's ~/.ssh/authorized_keys.
  • [ ] [CLI] Use USER/user instead of root in the ~/.dstack/ssh/config (ssh run_name → log in as a default/overridden user, ssh root@run_name → log in as root) if it's possible to log in as user (the user has a home dir and a proper login shell, not nologin/false).

Download the runner

  • [x] On shim-enabled instances, download the runner (and the SSH server if OpenSSH is used) with shim.
  • [ ] On backends without shim, try all possible tools (GNU Wget, Busybox Wget, cURL, urllib.urlopen, etc.) to download the runner/SSH server and fail if none available.

Bring our own SSH server

Statically linked OpenSSH or Dropbear or crypto/ssh-based Golang implementation embedded into the runner — yet to be decided.

  • [ ] [runner] If the runner is started by root, configure root SSH access. In addition, if JobSpec.user != root, configure non-root SSH access. In any case, JobSpec.user is the default SSH user (that is, JobSpec.user is the User in the SSH client config generated by dstack client).

(Optional) Images without *nix userland

  • [ ] [runner] Bring our own shell and tools (BusyBox).

un-def avatar Oct 23 '24 14:10 un-def

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Apr 20 '25 02:04 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.

github-actions[bot] avatar May 04 '25 02:05 github-actions[bot]