amazon-ecs-agent icon indicating copy to clipboard operation
amazon-ecs-agent copied to clipboard

Use resolved digest for image pulls

Open amogh09 opened this issue 1 year ago • 0 comments

Summary

This PR updates image pull logic to use a resolved image manifest digest if one is available. Image manifest digests are resolved during container transition to MANIFEST_PULLED state. The change ensures that the pulled image is the same as pointed by the resolved digest.

Implementation details

  • Add a new method TagImage to DockerClient interface and its implementation. The method tags an image on the host. The implementation performs retries using a new ConstantBackoff strategy that backs off the same duration every time. A constant backoff retry strategy is fine in this case as there is no external service involved.
  • Add a new ConstantBackoff backoff strategy under ecs-agent module. The strategy returns the same amount of backoff duration regardless of how many times its Duration method is called.
  • Update *DockerTaskEngine.pullAndUpdateContainerReference method that is used for pulling container images so that it uses the container's ImageDigest field to prepare a canonical reference to the image to be pulled. The method tags the pulled image with Container.Image if a different image reference was used to pull the image so that image caching and image cleanup continue to work as before.
  • Add a new method GetCanonicalRef to agent/utils/reference package that returns a canonical image reference given an image reference and a manifest digest.
  • Test updates and new tests.

Testing

  • Added a new integration test named TestPullContainerWithAndWithoutDigestInteg to check that *DockerTaskEngine.pullContainer can pull images for containers with and without an ImageDigest set.
  • Added a new integration test named TestPullContainerWithAndWithoutDigestConsistency to check that *DockerTaskEngine.pullContainer pulls the same image with or without a digest set and the image can be inspected with container.Image field in both cases.

In addition to the integration tests above, performed the following manual tests.

  • Ran a variety of tasks with Agent configured to use always image pull behavior. Checked that all tasks ran as expected. Images were pulled using digests and tagged with the image reference in the task definition. Images were cleaned up without any issues.
  • Ran a variety of tasks with Agent configured to use once and then prefer-cached image pull behavior. Checked that all tasks ran as expected. Cached images were used in both cases when found. Image cleanup worked as expected with once image pull behavior. Image pull is disabled when prefer-cached image pull behavior is used.
  • Ran a simple task multiple times with an Agent built with changes in this PR and again with an Agent built against master branch. Both Agents were configured to use always image pull behavior to force image pulls. Measured the task average start times (startedAt - createdAt) and task pull times (pullStoppedAt - pullStartedAt) for both cases. Changes to resolve image manifest digest in https://github.com/aws/amazon-ecs-agent/pull/4152 caused an additional delay in task start times that ranged from 300ms (ECR) to 900ms (Dockerhub), however, with this PR the image pulls are now slightly faster. Pull time for an image that's already available on the host is reduced from ~700ms (Dockerhub), ~250ms (public ECR), and ~130ms (private ECR) to ~260ms (Dockerhub), ~100ms (public ECR), and ~50ms (private ECR). Combined with changes to resolve image manifest digests (#4152) the overall average increase in task start time in my test environment is ~500ms (Dockerhub) and ~150ms (ECR).

New tests cover the changes: yes

Description for the changelog

Does this PR include breaking model changes? If so, Have you added transformation functions?

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

amogh09 avatar Apr 25 '24 00:04 amogh09