Performance issues when running in a Docker container on CircleCI
I'm running s3deploy on CircleCI using version 2.3.0. The performance is very poor, though I've increased the size of my container to 8 vCPUs and 16gb ram. I've also increased the number of s3deploy workers to 8. Often, the process runs out of memory and is shut down by Circle. If it succeeds, it takes a very long time, nearly 10 minutes or more. My project is quite large, and the particular set of uploads that break involve lots of image files in several subfolders.
I've tried a variety of optimizations, in particular modifying the config file to optimize the regex or reducing the search paths (my hunch being that there's some deep recursion happening and requiring heavy memory use).
I'm able to run the same routine on OS X which has similar specs as the Docker container I'm running on CircleCI. It runs very quickly in my local environment.
Also note that there are no new files that need uploading, so this slowness is not due to upload speeds. Are there any performance tips anyone can offer?
Actually, I now believe it may be related to resolving files that need to be updated. Is there a way to optimize this?
Is there a way to optimize this?
Not likely. Note that the algorithm used is pretty simple. We fetch metadata about the objects to do MD5 hash compares, which is obviously faster than uploading everything on every run, but we stream the files, so I don't see any obvious pitfalls, but then I'm just guessing.
I see. So likely the long time that it takes is a combination of fetching metadata plus calculating MD5s for so many files (many of which are larger images).
I've seen similar packages use ETags and store them in a locally cached file as json that could be checked in to source control. This might be a more performant diffing strategy?
This might be a more performant diffing strategy?
Sure, but it would be harder to get right (it would, for one, break if you build from different PCs/CI server). Also, you are the first person to report problems in this department. It works fine for me, and I'm the primary user (I wrote this for myself), it would be a hard sell to complicate this piece of software to solve problems that ... I don't have.
Also note that these ETags are stored in S3 as metadata, so I suspect it is the local MD5 hash calculation that I suspect would take time in this scenario, and that would not change with a local cache.
I'm getting the same issue:
uby/index.html (size) ↑ Killed
Exited with code 137
Hint: Exit code 137 typically means the process is killed because it was running out of memory
Hint: Check if you can optimize the memory usage in your app
Hint: Max memory usage of this container is 4288471040
according to /sys/fs/cgroup/memory/memory.max_usage_in_bytes
So, -force exists, but does it still do the md5 comparison?
@rvangundy I ended up using go3up which caches the md5 lookup locally before running, sped up my CircleCI runs to < 5 seconds and a fraction of the memory.
If you have a usecase of a large amount of files that dont change often, it seems like the best option 👍
@petems how do you know, using potentially old and stale cache files, that it hasn't changed? I ask because I would love to improve on this if possible -- I guess it could be possible if S3 could report a "last modifed" timestamp for the entire bucket.
For my usecase I'm the only pusher/owner on the repo, so I can generate a new cachefile, commit it to the repo and that covers 90% of the deploy.
I only really have to do it once to cache the last 3 or so year of blogposts and images, then after that I'm not that fussed about keeping it updated as it's already sped things up.
If there's something I'm missing and a way to do the same with s3deploy I'd be happy, as I've been using it fine for the last few years and would love to keep using it...but there's been some tipping point recently where my runs OOM in CircleCI (example here: https://circleci.com/gh/petems/petersouter.xyz/106)
NB: I should also probably try updating the docker image as it's using a version of s3deploy from 9 months ago, so need to check if it's still the case with a more recent build...
Looking at this now, I think there was a similar issue on Hugo's deploy command some time ago, I'll have a look.
Other than that, you may want to test reducing the number of workers e.g. s3deploy -workers 2.