ECR `image_id` pull fails with command-substituted `SKYPILOT_DOCKER_PASSWORD` due to dependency timing (Blocks PR #4871 use case)
The functionality introduced in PR #4871, which supports command substitution in SKYPILOT_DOCKER_PASSWORD (e.g., $(aws ecr get-login-password ...)), cannot be effectively used when resources.image_id points to an ECR image. The initial docker pull of this ECR image on the provisioned VM fails due to a dependency timing conflict.
Problem Details:
- SkyPilot provisions a base VM.
- When PR #4871's feature is used, SkyPilot attempts
docker loginby executing the command inSKYPILOT_DOCKER_PASSWORD, followed bydocker pullfor the specified ECRimage_id. This happens on the newly provisioned VM. - The critical issue is that SkyPilot's standard procedure for copying necessary user credentials, files, or tools (e.g.,
~/.aws/credentials, the AWS CLI itself if not pre-installed, or custom scripts which the password command might be) typically happens in a later provisioning stage. - Consequently, the command in
SKYPILOT_DOCKER_PASSWORDexecutes before its required dependencies are available on the VM. This leads to command failure,docker loginfailure, and ultimately, the ECRimage_idpull fails.
Impact on PR #4871:
This timing conflict is a significant blocker for using PR #4871's command substitution feature for a primary use case: dynamically authenticating to ECR (or similar registries) when the ECR image itself is the runtime environment specified via image_id. While the substitution mechanism itself might be in place, it fails in practice for image_id scenarios due to this provisioning order.
Expected Behavior/Resolution Path:
To enable PR #4871 for ECR image_id use cases, the dependencies (tools, configs, scripts) required by the SKYPILOT_DOCKER_PASSWORD command must be present and configured on the VM at the moment docker login is attempted for the initial image_id pull.
Potential solutions could involve:
- Modifying SkyPilot's provisioning sequence to ensure these specific dependencies are deployed to the VM before the
docker loginforimage_idis attempted. - Introducing a dedicated pre-flight mechanism for staging just these critical Docker authentication dependencies.
Addressing this is essential for PR #4871 to successfully support dynamic ECR authentication for containerized runtime environments.
CC'ing @cblmemo here
Both sounds a little bit hacky to me. Lets defer this issue for now.
In our project we handle ECR auth with this patch:
diff -ur sky/provision/docker_utils.py sky-patched/provision/docker_utils.py
--- sky/provision/docker_utils.py 2025-06-02 23:54:17
+++ sky-patched/provision/docker_utils.py 2025-06-16 15:57:46
@@ -234,6 +234,23 @@
f'--password {shlex.quote(docker_login_config.password)} '
f'{shlex.quote(docker_login_config.server)}',
wait_for_docker_daemon=True)
+ elif docker_login_config.server.endswith(".amazonaws.com") and ".ecr." in docker_login_config.server:
+ # Handle AWS ECR authentication
+ server = docker_login_config.server
+ if server.endswith(".amazonaws.com") and ".ecr." in server:
+ self._run('sudo apt install -y amazon-ecr-credential-helper')
+ # Create the docker config directory if it doesn't exist
+ self._run('mkdir -p ~/.docker')
+ # Create or update the docker config.json file with ECR credentials helper
+ config_json = {
+ "credHelpers": {
+ server: "ecr-login"
+ }
+ }
+ # Write the config to a temporary file and move it to the right location
+ import json
+ self._run(f'echo \'{json.dumps(config_json, indent=2)}\' > /tmp/docker_config.json && '
+ f'mv /tmp/docker_config.json ~/.docker/config.json')
elif docker_login_config.server.endswith('-docker.pkg.dev'):
# Docker image server is on GCR, we need to do additional setup
# to pull the image.
diff -ur sky/resources.py sky-patched/resources.py
--- sky/resources.py 2025-06-16 15:54:40
+++ sky-patched/resources.py 2025-06-16 15:57:08
@@ -1699,9 +1699,6 @@
add_if_not_none('labels', self.labels)
if self._autostop_config is not None:
config['autostop'] = self._autostop_config.to_yaml_config()
- if self._docker_login_config is not None:
- config['_docker_login_config'] = dataclasses.asdict(
- self._docker_login_config)
if self._docker_username_for_runpod is not None:
config['_docker_username_for_runpod'] = (
self._docker_username_for_runpod)
Thanks for sharing @berekuk -- would you like to submit a PR?
Thanks to @berekuk. We recently have a similar implementation in 1b3287a. Feel free to adopt our next release and remove this patch.