moby-ryuk icon indicating copy to clipboard operation
moby-ryuk copied to clipboard

feat: shutdown race resilience

Open stevenh opened this issue 1 year ago • 5 comments

A significant rewrite to ensure that we don't suffer from shutdown race conditions as the prune condition is met and additional resources are being created.

Previously this would remove resources that were still in use, now we retry if we detect new resources have been created within a window of the prune condition triggering.

This supports the following new environment configuration settings:

  • RYUK_REMOVE_RETRIES - The number of times to retry removing a resource.
  • RYUK_REQUEST_TIMEOUT - The timeout for any Docker requests.
  • RYUK_RETRY_OFFSET - The offset added to the start time of the prune pass that is used as the minimum resource creation time

Also bumps go to v1.22 and golangci-lint to v1.59.1.

stevenh avatar Jul 27 '24 20:07 stevenh

For some reason, I need to set up the loadConfig tests with RYUK_VERBOSE=false in order to make the tests pass locally:

tests := map[string]struct {
		setEnv   func(*testing.T)
		expected config
	}{
		"defaults": {
+			setEnv: func(t *testing.T) {
+				t.Helper()
+				t.Setenv("RYUK_VERBOSE", "false")
+			},
			expected: config{
				ConnectionTimeout:   time.Minute,
				Port:                8080,
				ReconnectionTimeout: time.Second * 10,
				RemoveRetries:       10,
				RequestTimeout:      time.Second * 10,
				RetryOffset:         -time.Second,
				ShutdownTimeout:     time.Minute * 10,
			},
		},
		"custom": {
			setEnv: func(t *testing.T) {
				t.Helper()
				t.Setenv("RYUK_PORT", "1234")
				t.Setenv("RYUK_CONNECTION_TIMEOUT", "2s")
				t.Setenv("RYUK_RECONNECTION_TIMEOUT", "3s")
				t.Setenv("RYUK_REQUEST_TIMEOUT", "4s")
				t.Setenv("RYUK_REMOVE_RETRIES", "5")
				t.Setenv("RYUK_RETRY_OFFSET", "-6s")
				t.Setenv("RYUK_SHUTDOWN_TIMEOUT", "7s")
+				t.Setenv("RYUK_VERBOSE", "false")
			},
			expected: config{
				Port:                1234,
				ConnectionTimeout:   time.Second * 2,
				ReconnectionTimeout: time.Second * 3,
				RequestTimeout:      time.Second * 4,
				RemoveRetries:       5,
				RetryOffset:         -time.Second * 6,
				ShutdownTimeout:     time.Second * 7,
			},
		},
	}

A bug in the env library for default booleans?

mdelapenya avatar Aug 21 '24 11:08 mdelapenya

For some reason, I need to set up the loadConfig tests with RYUK_VERBOSE=false in order to make the tests pass locally:

A bug in the env library for default booleans?

Interesting what's the error you get?

stevenh avatar Aug 21 '24 18:08 stevenh

For some reason, I need to set up the loadConfig tests with RYUK_VERBOSE=false in order to make the tests pass locally:

snip...

A bug in the env library for default booleans?

If you had environment vars set then the test would have failed as it was expecting a clean environment.

I've updated the test to clean the current environment before it starts, which could fix the issue you saw.

stevenh avatar Sep 03 '24 12:09 stevenh

Hi @stevenh, thanks for this PR, I've started the review and I see that there are some changes that would deserve its own PR:

  • the addition of a library to read the env vars. Is there any other library you considered for that? I like this one is simple and with zero deps.
  • the bumps for the different Go versions and tools (golangci-lint)
  • the real refactor

Could you separate them into three? I can help out with the two first so you can focus on the real meat here

mdelapenya avatar Sep 06 '24 11:09 mdelapenya

Ok so little more involved due to go bump needed new linter so we need to merge in the following order:

  1. https://github.com/testcontainers/moby-ryuk/pull/160
  2. https://github.com/testcontainers/moby-ryuk/pull/159 -- Also includes version bump so it complies
  3. https://github.com/testcontainers/moby-ryuk/pull/141 -- Will need a rebase after the config merge
  4. https://github.com/testcontainers/moby-ryuk/pull/158 -- Currently erroring, but no point fixing

I didn't consider other env config libraries, as I've used that one a bunch of times and has worked well, but if there is something better happy to swap.

stevenh avatar Sep 07 '24 10:09 stevenh

Given #159 has been merged, could you please rebase this one? 🙏

mdelapenya avatar Sep 30 '24 14:09 mdelapenya

@mdelapenya added some docker cli based tests for container, network, image and volume reaping so should be good to go.

stevenh avatar Oct 03 '24 19:10 stevenh