docs icon indicating copy to clipboard operation
docs copied to clipboard

Update limits on overlay networks

Open hoerup opened this issue 4 years ago • 15 comments

A while ago there were issues with loadbalancer 1) on large overlay networks, so a limit in overlay size was added to documentation in 2) This issue has since been resolved, and according to 3) the limitations are no longer necessary.

The documentation should be updated accordingly

  1. https://github.com/moby/moby/issues/30820
  2. https://github.com/docker/docker.github.io/pull/5208
  3. https://github.com/moby/moby/pull/37372#issuecomment-414391171

hoerup avatar Sep 23 '20 10:09 hoerup

@thaJeztah Is this true that a /24 is no longer required for an overlay network?

If so let me know and I can work on a PR for the docs.

The part of the docs in question:

You should create overlay networks with /24 blocks (the default), which limits you to 256 IP addresses, when you create networks using the default VIP-based endpoint-mode. This recommendation addresses limitations with swarm mode.

https://docs.docker.com/engine/reference/commandline/network_create/#overlay-network-limitations

clintmod avatar Apr 06 '21 18:04 clintmod

It would be good if the documentation included what version of docker this changed.

chey avatar Sep 17 '21 15:09 chey

There hasn't been any activity on this issue for a long time. If the problem is still relevant, mark the issue as fresh with a /remove-lifecycle stale comment. If not, this issue will be closed in 14 days. This helps our maintainers focus on the active issues.

Prevent issues from auto-closing with a /lifecycle frozen comment.

/lifecycle stale

docker-robott avatar Nov 29 '22 01:11 docker-robott

Still relevant

hoerup avatar Nov 29 '22 10:11 hoerup

Any updates?

fannyfan414 avatar Dec 24 '22 04:12 fannyfan414

/remove-lifecycle stale

fannyfan414 avatar Dec 24 '22 15:12 fannyfan414

Any news?

fannyfan414 avatar Feb 24 '23 22:02 fannyfan414

There hasn't been any activity on this issue for a long time. If the problem is still relevant, mark the issue as fresh with a /remove-lifecycle stale comment. If not, this issue will be closed in 14 days. This helps our maintainers focus on the active issues.

Prevent issues from auto-closing with a /lifecycle frozen comment.

/lifecycle stale

docker-robott avatar May 25 '23 01:05 docker-robott

/remove-lifecycle stale

hoerup avatar May 25 '23 09:05 hoerup

@akerouanton @dvdksn Is any of this covered in the networking rewrite? (Sorry still haven't found time to look at the PR 🙈)

thaJeztah avatar May 25 '23 09:05 thaJeztah

We have not updated this bit yet, no. Sounds like we can just remove the Overlay network limitations🔗 section then?

dvdksn avatar May 25 '23 10:05 dvdksn

@dvdksn I don't think so, because the overlay network is actually using a bridge interface internally to connect containers co-located on a same host. As such, the limitations I asked you to add to the bridge doc page also apply to the overlay driver.

akerouanton avatar May 25 '23 10:05 akerouanton

So if somebody creates a /16 overlay network, what problems could they be facing?

Rush avatar Feb 15 '24 00:02 Rush

I guess nothing, until you hit the limit of 1024 interfaces. That's hard-coded in the kernel. ref

But I will let @akerouanton correct me

dvdksn avatar Feb 15 '24 08:02 dvdksn

i only have 12 /24 overlay networks with encryption enabled - one manager and two worker nodes- when i first boot up the boxes- i'm able to spin up containers until i hit a limit and then any subsequent containers- even when i removed them- won't start up- all containers are stuck at a "ready" state or new:

ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS vkri0f9lwtzi docs_code.1 collabora/code:latest Running New 44 minutes ago

ID NAME MODE REPLICAS IMAGE PORTS 68uaoz7y3oiz docs_code replicated 0/1 collabora/code:latest

  • if i reboot the servers then im able to spin up the containers - ive tried to cleanup any lingering containers but am not able to free up any resources start these up again- im really leaning towards this being a network limitation issue although i don't think im pushin the envelope very much- i am running traefik in the front but even if these 5 webapps are all on the traefik network- there shouldn't be a probelm. its good to note- on some other hardware i was getting a network allocation OOM error. im deparate to fix this- do i need to change to /16 segments or try with the dnsrr? im pretty new to docker swarm at scale and really need some help- i have people that want to push to production very shortly and i can't even spin up all our containers or confidently restart them! Im grateful for any help friends :-)

dounoit avatar Mar 02 '24 02:03 dounoit