compose icon indicating copy to clipboard operation
compose copied to clipboard

Fix COMPOSE_PARALLEL_LIMIT

Open pagelypete opened this issue 4 years ago • 14 comments

Description of the issue

There have already been multiple issues opened about this bug: https://github.com/docker/compose/issues/7486 https://github.com/docker/compose/issues/5864

A PR already seems to exist which fixes the issue which was posted with reference to #5864 - could this be checked, tested, and merged so this bug can finally be fixed in master?

The bug causes a fairly critical issue, it completely deadlocks compose, and in the worst case scenario, it also seems to be able to cause interactions with the Docker daemon to hang due to the amount of open connections it is holding.

The bug is worse when your compose file is dynamically generated, and can contain varying numbers of services, which means that you have to set a global system wide environment variable to be an arbitrarily high number - completely defeating the point of the variable in the first place (to reduce CPU usage for container operations).

Anecdotally, I have tested that PR on 1.28.5, and up/down/restart/etc. operations all seem to work. In a service file with 150 services, this deadlocks on master:

COMPOSE_PARALLEL_LIMIT=2 docker-compose up -d --remove-orphans

With the linked PR, it does not, and actually functions as you would expect (essentially sequentially bringing up the containers)

Context information (for bug reports)

Output of docker-compose version

docker-compose version 1.28.5, build unknown
docker-py version: 4.4.4
CPython version: 3.8.6
OpenSSL version: OpenSSL 1.1.1f  31 Mar 2020

Steps to reproduce the issue

  1. Create a docker-compose.yml file with a lot of services (ideally 100+)
  2. Set COMPOSE_PARALLEL_LIMIT to 2 and run an up operation

Observed result

Compose will deadlock.

Expected result

Compose should bring up containers using the parallel limit.

Stacktrace / full error message

N/A

Additional information

Not relevant

pagelypete avatar Mar 22 '21 11:03 pagelypete

hello, we also have massive problems with docker-compose. it does not work reliable to start containers on a host . we use docker-compose part of a ci/cd environment and jenkins job fail multiple times a day because docker-compose will not reliably return "success" to bring a bunch of containers online on a remote host via ssh.

devZer0 avatar Apr 01 '21 13:04 devZer0

now, months later this is still open and not further commented. is docker-composed orphaned/abandoned software nobody cares for ?

devZer0 avatar Sep 08 '21 12:09 devZer0

Have you tried Docker Compose v2? With a new golang codebase, I guess concurency is better addressed.

ndeloof avatar Sep 08 '21 15:09 ndeloof

Have you tried Docker Compose v2? With a new golang codebase, I guess concurency is better addressed.

I personally have not tried v2 however the issue here was the load caused by doing so many Docker operations at once, not any issues with the performance of compose itself. Is there a way to limit parallel Docker operations in v2 similar to COMPOSE_PARALLEL_LIMIT (except working properly)?

Either way, it would be really great to get this changed merged, we have been using it in production for a long time now and have to build our own docker-compose just to use it, and it really does seem to fix the parallel problem and make COMPOSE_PARALLEL_LIMIT work properly. Servers that would otherwise be hugely overloaded can instead work normally.

pagelypete avatar Sep 08 '21 16:09 pagelypete

@ndeloof I gave v2 a try and I don't see any way to limit parallelism, and while I can see that performance of compose itself has been hugely improved, I'm concerned about not seeing a way to limit what it sends to the Docker API in v2 at all (I grepped the source a little).

For example if you have a compose file with 200 containers, that can otherwise function fine on a system, running docker-compose up results in an absolutely massive load spike, whereas with a parallel limit in place it can be done more slowly but with no system load issues at all.

pagelypete avatar Sep 09 '21 09:09 pagelypete

we also cannot use v2 as there is no mechanism to limit parallelism. our servers are getting totally overloaded if there are too many services getting started in parallel, i.e. services fail to start correctly if there is too much load, so please either fix v1 or add COMPOSE_PARALLEL_LIMIT to v2

devZer0 avatar Sep 09 '21 10:09 devZer0

we are also using the patch in production now, with good results so far

devZer0 avatar Sep 14 '21 10:09 devZer0

i'm very sad that nobody cares and this is being ignored

devZer0 avatar Sep 24 '21 09:09 devZer0

We also ran into the issue of massive spike in memory and cpu usage when building. A way to limit parallelism is a must have for us.

ondravondra avatar Sep 29 '21 12:09 ondravondra

We also can not use compose v2 as that just freezes our buildserver. We have more then 20 docker images in a docker compose file and building all at once just breaks everything. I noticed that buildkit has an option "max-parallelism" which should prevent that but in our case "docker buildx bake" fails for our docker-compose file with "mapping values are not allowed in this context".

As far as I understand from the code docker compose v2 is "hardcoded" to use the default buildkit docker driver. I can't see any way to set max-parallelism for that instance. Would it be possible to pass a buildx builder to docker-compose which can be used instead of the default one? That way we can at least create a builder with "docker buildx create --name bla --buildkitd-flags '--oci-worker-max-parallelism=1'" and pass that to docker compose maybe something like "docker compose build --builder bla" or maybe docker compose can just use the one that is set to be used by buildx.

The buildx issue related to that is https://github.com/docker/buildx/issues/359

WolfspiritM avatar Oct 28 '21 01:10 WolfspiritM

We also can not use compose v2 as that just freezes our buildserver. We have more then 20 docker images in a docker compose file and building all at once just breaks everything. I noticed that buildkit has an option "max-parallelism" which should prevent that but in our case "docker buildx bake" fails for our docker-compose file with "mapping values are not allowed in this context".

As far as I understand from the code docker compose v2 is "hardcoded" to use the default buildkit docker driver. I can't see any way to set max-parallelism for that instance. Would it be possible to pass a buildx builder to docker-compose which can be used instead of the default one? That way we can at least create a builder with "docker buildx create --name bla --buildkitd-flags '--oci-worker-max-parallelism=1'" and pass that to docker compose maybe something like "docker compose build --builder bla" or maybe docker compose can just use the one that is set to be used by buildx.

The buildx issue related to that is docker/buildx#359

It sounds like this would go some way to fixing the issue with relation to building but not when bringing containers up (which was actually the original use-case triggering the issue for us).

I think the biggest problem is getting compose devs to actually see this issue - there are so many open issues that I guess many get lost, and I suspect that since this one has been labelled as being a compose v1 issue, it may not get looked at any more. Because of this, I have opened this issue to request the feature in compose v2 - https://github.com/docker/compose/issues/8849

pagelypete avatar Oct 28 '21 09:10 pagelypete

With a complex compose stack I'm seeing frequent:

error getting credentials - err: fork/exec REMOVED/docker-credential-ecr-login: too many open files, out: ``

when running docker compose pull.

So an option to limit pull parallelism would be great.

johanneswuerbach avatar Mar 23 '22 12:03 johanneswuerbach

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 21 '22 09:09 stale[bot]

push

jeffrson avatar Sep 21 '22 10:09 jeffrson

This issue has been automatically closed because it had not recent activity during the stale period.

stale[bot] avatar Nov 02 '22 03:11 stale[bot]

stale bot sucks

not fixing bugs sucks , too.

devZer0 avatar Nov 02 '22 09:11 devZer0

Yep, quite annoying since @jeffrson bumped it right after the label was added...

pagelypete avatar Nov 02 '22 10:11 pagelypete

The issue for tracking the lack of support in Compose v2 for COMPOSE_PARALLEL_LIMIT is at #9091 and support was added in v2.15.

I'm going to lock this now to prevent further confusion because this issue was tracking a potential deadlock in Compose v1 when using COMPOSE_PARALLEL_LIMIT.

If you have issues with Compose v2.15+ and COMPOSE_PARALLEL_LIMIT, please create a new bug report.

milas avatar Jan 30 '23 18:01 milas