moby icon indicating copy to clipboard operation
moby copied to clipboard

Generate split subnets on-demand and properly shift bytes of IPv6 subnets

Open akerouanton opened this issue 2 years ago • 19 comments

I bundled these two commits together as the second one is easier to test with changes done in the first one. I can still break this PR into two parts and rewrite the unit test of the second commit to test it with the current master branch. Let me know if you prefer that (eg. if you want to cherry-pick this change into older releases or because it's a big diff to review).

- 1st commit

This commit resolves #40275 by implementing a custom iterator named NetworkSplitter. It splits a set of NetworkToSplit into smaller subnets on-demand by calling its Get method.

Prior to this change, the list of NetworkToSplit was split into smaller subnets when ConfigLocalScopeDefaultNetworks or ConfigGlobalScopeDefaultNetworks were called or when ipamutils package was loaded. When one of the Config function was called with an IPv6 net to split into small subnets all the available memory was consumed. For instance, fd00::/8 split into /96 would take ~5*10^27 bytes.

Although this change trades memory consumption for computation cost, the NetworkSplitter is used by libnetwork/ipam package in such a way that it only have to compute the address of the next subnet. When NetworkSplitter reach the end of NetworkToSplit, it's resetted by libnetwork/ipam only if there were some subnets released beforehand. In such case, ipam package might iterate over all the subnets before finding one available subnet. This is the worst-case but it shall not be really impactful as either many subnets exists (more than one host can handle) or not so many subnets exists and full iteration over NetworkSplitter is fast.

- 2nd commit

Prior to this change, IPv6 subnets generation was broken because of a bitwise shift overflowing uint64.

To solve this issue, addIntToIP has been changed to take an addend and computes two ordinals applied to respectively the lower 64 bits and the upper 64 bits of IPv6 addresses.

Nothing change for IPv4 (ie. upperOrdinal is always 0).

- How to verify it

I added unit tests to confirm these changes are okay. Moreover, I tried to run dockerd with following config:

{
	"log-level": "info",
	"ipv6": true,
	"experimental": true,
	"ip6tables": true,
	"fixed-cidr-v6": "2a01:e34:xxxx:yyyy::/64",
	"default-address-pools": [
		{ "base": "172.17.0.0/16", "size": 16 },
		{ "base": "172.18.0.0/16", "size": 16 },
		{ "base": "2a01:e34:xxxx:yyyy::/64", "size": 120 }
	]
}

When starting dockerd v20.10.10 with this config, the daemon crashes with this panic message message: runtime error: makeslice: cap out of range. When starting a patched version of dockerd, it doesn't crash.

- Description for the changelog

  • Generate split subnets on-demand to not consume all the available memory when small IPv6 subnets are created from big subnets (fixes #40275)
  • Properly shift bytes of IPv6 subnets (fixes #42801)

akerouanton avatar Nov 19 '21 02:11 akerouanton

Hey, awesome that somebody is giving ipv6 some love 👍 Any chance this is fixing #41438 aswell ?

Xyaren avatar Nov 25 '21 22:11 Xyaren

@Xyaren I still need a bit of time to test it properly but yeah, it should fix #41438 as (I believe) this is a consequence of the uint64 overflow :wink: Thanks for mentioning this issue :slightly_smiling_face:

akerouanton avatar Nov 26 '21 11:11 akerouanton

@thaJeztah PTAL :slightly_smiling_face: (I don't know who else I should ping. Maybe arkodg?)

akerouanton avatar Nov 29 '21 00:11 akerouanton

So.. about 5 month later... Is there anyone able to look at this please ?

Xyaren avatar Apr 29 '22 16:04 Xyaren

@akerouanton looks like some failures in a test (may need some extra options in it);

=== Failed
=== FAIL: github.com/docker/docker/daemon/config TestDaemonConfigurationMergeDefaultAddressPools/empty_config_file (0.00s)
    config_test.go:155: assertion failed: cannot handle unexported field at {[]*ipamutils.NetworkToSplit}[0].ipVersion:
        	"github.com/docker/docker/libnetwork/ipamutils".NetworkToSplit
        consider using a custom Comparer; if you control the implementation of type, you can also consider using an Exporter, AllowUnexported, or cmpopts.IgnoreUnexported
    --- FAIL: TestDaemonConfigurationMergeDefaultAddressPools/empty_config_file (0.00s)
=== FAIL: github.com/docker/docker/daemon/config TestDaemonConfigurationMergeDefaultAddressPools/config_file (0.00s)
    config_test.go:165: assertion failed: cannot handle unexported field at {[]*ipamutils.NetworkToSplit}[0].ipVersion:
        	"github.com/docker/docker/libnetwork/ipamutils".NetworkToSplit
        consider using a custom Comparer; if you control the implementation of type, you can also consider using an Exporter, AllowUnexported, or cmpopts.IgnoreUnexported
    --- FAIL: TestDaemonConfigurationMergeDefaultAddressPools/config_file (0.00s)

thaJeztah avatar Jun 07 '22 20:06 thaJeztah

@thaJeztah Ah, looks like a bad rebase. Should be fixed now.

akerouanton avatar Jun 11 '22 22:06 akerouanton

@thaJeztah Any chance to have another look at this? Would be really nice to have that fixed!

max-wittig avatar Sep 07 '22 12:09 max-wittig

@thaJeztah @cpuguy83 Friendly ping 😄 . We've added IPv6 networking support to the gitlab-runner, but we have no way to use it as IPv6 networking in Docker is still broken, when used in this way.

Is there any way I can support the efforts here?

max-wittig avatar Oct 10 '22 07:10 max-wittig

@akerouanton Would you mind to implement these small requested changes? It would really help us to use Docker with IPv6 🥺

max-wittig avatar Nov 01 '22 15:11 max-wittig