charliecloud icon indicating copy to clipboard operation
charliecloud copied to clipboard

ch-image: allow concurrent builds

Open qwofford opened this issue 3 years ago • 2 comments

I'd like to get some thoughts on how to accommodate build concurrency for charliecloud with the new build cache. Today, in order to use v0.28, I have a separate CH_IMAGE_STORAGE directory for each container I'm building in parallel, so I have to duplicate base images before I can start the parallel build. Ideally, I'd like to build something on the order of 10 containers in parallel. Those containers are the base for other containers which are also on the order of 10, and the problem gets worse after that.

I could:

  1. Tolerate the cost of duplicating a base image for all the containers that I'm building. I have to pull the base image for each recipe that requires it, increasing load on container registries and increasing build pipeline complexity (provisioning and tearing down storage dirs for each container built in parallel). Keeping all of these base images around pushes me out of most tmpfs capacities.
  2. Serialize the container assembly portion of my build pipelines so I don't have to worry about parallel builds.
  3. Perhaps turn off the build cache (the default mode) should also turn off the concurrency lock to enable parallel builds until some other solution is available?
  4. Something else?

qwofford avatar Jul 05 '22 18:07 qwofford

So proper locking turns out to be harder than I thought. I tried a relatively naive approach to lock either “the whole storage directory” or ”an individual image” (PR #1417) that did not work. I believe options to accomplish this include:

  1. Real multi-granularity locking. If we assume we only need exclusive locks, it’s possible that Linux’ shared locks could stand in for IX (intention exclusive). I haven’t thought this through.

  2. Flat locking, with one lock per image plus another for “everything else”. Advantage: simpler. Disadvantages: (a) many locks since there can be a lot of images; (b) new images can be created during the process of locking all the images (is this a problem? I don’t know).

Crabbing as used in B-trees is no good because B-tree nodes don't contain one another in the way we need.

reidpr avatar Jul 29 '22 15:07 reidpr

re. option 2, some quick timings:

Python 3.6 on NFS:

$ python3 -m timeit -s "import fcntl" "fp = open('foo', 'w'); fcntl.lockf(fp, fcntl.LOCK_EX); fp.close()"
100 loops, best of 3: 2.6 msec per loop

Python 3.9 on tmpfs:

$ python3 -m timeit -s "import fcntl" "fp = open('foo', 'w'); fcntl.lockf(fp, fcntl.LOCK_EX); fp.close()"
10000 loops, best of 5: 24.5 usec per loop

reidpr avatar Jul 29 '22 16:07 reidpr