data-transfer-hub
data-transfer-hub copied to clipboard
ECR DTH failed to copy image: read tcp xxxx->161.189.44.22:443: read: connection timed out
Hello,
We created an ECR DTH task to sync ECR images from us-east-1
region to cn-northwest-1
region. We have been periodically received AWS notification message emails since the task created. The email stated:
{"error":"Failed to copy image","execution":"7e7944e9-e302-0881-0ec7-6f66e027abf5_947d8293-b762-89ea-4a32-f6485e72464a","image":"<repository_name>","tag":"<tag>"}
We verified the image digest in the target region actually matches the image digest in the source region. And, we tracked down the ECS logs and noticed this error:
time="2022-09-12T20:42:44Z" level=fatal msg="trying to reuse blob sha256:bbab4ec87ac4f89eaabdf68dddbd1dd930e3ad43bded38d761b89abf9389a893 at destination: Head \"https://<aws_china_account_id>.dkr.ecr.cn-northwest-1.amazonaws.com.cn/v2/<repository_name>/blobs/sha256:bbab4ec87ac4f89eaabdf68dddbd1dd930e3ad43bded38d761b89abf9389a893\": read tcp 10.0.0.XX:52920->161.189.44.22:443: read: connection timed out"
However, not every time we saw a connection timed out error, we would receive an AWS notification message email. We don't know if the connection timed out is the root cause for the "Failed to copy image" error and are not sure how to solve this time-out issue.
Template version v1.0.3
Hi @solaamy , Every transfer job is designed to retry at most 3 times. If all 3 retries fail, an alarm email will be sent.
Since the data is transferred through the public internet, its transmission performance will be affected by the network environment, and sometimes a timeout error will occur.
Here is a workaround:
- Go to AWS Event Bridge Rule, and adjust the time interval for ECR comparison job from the default 1 day to 1 hour. This action may solve the image transfer timeout issue by adding more retries in job level.
Due to no further updates, we are closing this issue. If you have any new questions or concerns, please create a new one.