testcontainers-java icon indicating copy to clipboard operation
testcontainers-java copied to clipboard

NetworkMode=host causes starting issue

Open rgugliel-da opened this issue 3 years ago • 20 comments

I use the following code to start a container:

private class CustomOracleContainer
    extends GenericContainer[CustomOracleContainer](
      DockerImageName.parse("mycompany/oracle:enterprise-19.14.0-preloaded-20220120-27-36701e3")
    )

private val oracleContainer: CustomOracleContainer = new CustomOracleContainer()
  .withExposedPorts(1521)
  .withNetworkMode("host")
  .waitingFor(Wait.forLogMessage("DATABASE IS READY TO USE!\\n", 1))
  .withStartupTimeout(Duration.ofSeconds(160))

With testcontainers 1.15.1, it works. When updating to 1.16.3, I get the following error:

ERROR ?.1.0-preloaded-20220120-27-36701e3] - Could not start container
org.testcontainers.shaded.org.awaitility.core.ConditionTimeoutException: Lambda expression in org.testcontainers.containers.GenericContainer: expected the predicate to return <true> but it returned <false> for input of <InspectContainerResponse(...))> within 5 seconds.
	at org.testcontainers.shaded.org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:164)
	at org.testcontainers.shaded.org.awaitility.core.AbstractHamcrestCondition.await(AbstractHamcrestCondition.java:86)
	at org.testcontainers.shaded.org.awaitility.core.ConditionFactory.until(ConditionFactory.java:939)
	at org.testcontainers.shaded.org.awaitility.core.ConditionFactory.until(ConditionFactory.java:645)

Any idea what's happening ?

rgugliel-da avatar Mar 04 '22 13:03 rgugliel-da

I have the same issue... For me as a workaround, I copy-pasted the GenericContainer -class (and dependent classes) from Git, commit of version 1.16.3 - and "reverted" the changes of Commit "Remove withPublishAllPorts from Ryuk and stabilize containerInfo content on start (#4263)"; 034daa6e from 20.07.21 in the class mentioned above. These are the lines where the exception is thrown... But the container is running anyway!

The problem has something to do with starting the container with networkMode=host.

public class ShowHostNetworkProblemTest {

    @Test
    void startWithHostNetwork_fails() throws Exception {
        MariaDBContainer mariaDBContainer = new MyMariaDBContainer("mariadb:10.1.48");
        mariaDBContainer.withNetworkMode("host");

        try {
            mariaDBContainer.start();

        } catch (ContainerLaunchException ex) {
            // noop
        } finally {
            Connection connection = mariaDBContainer.createConnection("");
            Statement statement = connection.createStatement();
            ResultSet resultSet = statement.executeQuery("SELECT 2;");
            resultSet.next();
            int anInt = resultSet.getInt(1);
            assertEquals(2, anInt);

            System.out.println("Container is running!!");
        }

        assertNotNull(mariaDBContainer);
    }

    public static class MyMariaDBContainer extends MariaDBContainer<MyMariaDBContainer> {
        MyMariaDBContainer(String dockerImageName) {
            super(dockerImageName);
        }

        @Override
        public Integer getMappedPort(int originalPort) {
            if ("host".equals(getNetworkMode())) {
                return originalPort;
            } else {
                return super.getMappedPort(originalPort);
            }
        }
    }
}

I hope the fix will come with the next version, so I can remove my workaround-copy-pasted-classes!

pretep22 avatar Mar 07 '22 14:03 pretep22

I could confirm the issue and it might be the case for any container exposing ports while using network mode host.

kiview avatar Mar 08 '22 14:03 kiview

I am also affected by this in a certain scenario. Is there any way I can help with this issue? (Information about the env, code, etc.?)

javahippie avatar Jun 20 '22 16:06 javahippie

I think we would need to debug why the check that was introduced with https://github.com/testcontainers/testcontainers-java/commit/034daa6e3d4a99c4ce71332ac49bf66957f903db (which did fix certain race condition scenarios, especially on Windows) seems to not apply if network mode is set to host.

kiview avatar Jun 20 '22 19:06 kiview

Looking into this now. What I've found so far is:

GenericContainer.start waits for all the exposed ports to be available in the output of docker inspect (or more accurately the corresponding Docker Engine API route) as part of this snippet of code:

// Wait until inspect container returns the mapped ports
containerInfo =
  await()
    .atMost(5, TimeUnit.SECONDS)
    .pollInterval(DynamicPollInterval.ofMillis(50))
    .pollInSameThread()
    .until(
        () -> dockerClient.inspectContainerCmd(containerId).exec(),
        inspectContainerResponse -> {
            Set<Integer> exposedAndMappedPorts = inspectContainerResponse
                .getNetworkSettings()
                .getPorts()
                .getBindings()
                .entrySet()
                .stream()
                .filter(it -> Objects.nonNull(it.getValue())) // filter out exposed but not yet mapped
                .map(Entry::getKey)
                .map(ExposedPort::getPort)
                .collect(Collectors.toSet());
  
            return exposedAndMappedPorts.containsAll(exposedPorts);
        }
    );

The issue is, however, that published (exposed) ports are discarded on network mode host. E.g.

$ docker run --publish 8080:80 --network host --detach nginx
WARNING: Published ports are discarded when using host network mode
1da8e93a7de7011dff3a523e54d751de669988f8edcfea9f5c3c3cf54fefe0bc

In the above example the code expects docker inspect to return:

    {
        ...
        "NetworkSettings": {
           ...
            "Ports": {
                "80/tcp": [
                    {
                        "HostIp": "0.0.0.0",
                        "HostPort": "8080"
                    },
                    {
                        "HostIp": "::",
                        "HostPort": "8080"
                    }
                ]
            }
        }
    }

But in reality, docker inspect returns:

    {
        ...
        "NetworkSettings": {
           ...
            "Ports": {}
        }
    }

So the container times out and throws an exception. From here, I'm planning to look into the reasoning behind the above code snippet and potential fixes.

aidando73 avatar Aug 20 '22 02:08 aidando73

@REslim30 If I remember correctly, parts of this code were refactored in order to remove certain race conditions on Docker Desktop (particularly Docker Desktop on Windows). Your findings make sense, this is likely the issue.

kiview avatar Aug 22 '22 12:08 kiview

I've been testing locally and I think there's a simple fix we could do. We could add to ContainerState.getMappedPort an additional condition:

    default Integer getMappedPort(int originalPort) {
        ...
        Ports.Binding[] binding = new Ports.Binding[0];
        final InspectContainerResponse containerInfo = this.getContainerInfo();
        if (containerInfo != null) {
            if (containerInfo.getHostConfig().getNetworkMode().equals("host")) {
                return originalPort;
            }
            binding = containerInfo.getNetworkSettings().getPorts().getBindings().get(new ExposedPort(originalPort));
        }
        ...
    }

Which simply returns the original port if the container is in host mode. Though this feels a bit wrong. host mode doesn't really have a concept of mapped ports - the ports are simply on the host. It seems like we'd be stretching this method beyond it's original abstraction.

The more semantically correct alternative would be to throw if getMappedPort is called when container is host mode:

    default Integer getMappedPort(int originalPort) {
        ...
        Ports.Binding[] binding = new Ports.Binding[0];
        final InspectContainerResponse containerInfo = this.getContainerInfo();
        if (containerInfo != null) {
            if (containerInfo.getHostConfig().getNetworkMode().equals("host")) {
                throw new IllegalArgumentException("getMappedPort is not supported for host network mode");
            }
            binding = containerInfo.getNetworkSettings().getPorts().getBindings().get(new ExposedPort(originalPort));
        }
        ...
    }

We would, however, need to check all references to getMapped port and add in a check for host mode and there's a lot: Screenshot from 2022-08-25 20-27-41

aidando73 avatar Aug 25 '22 11:08 aidando73

Thanks for the investigation @REslim30, this is super helpful to proceed with this.

Both your points are valid and solution 1 seems very pragmatic. And we can improve on this by documenting this behavior in the Javadoc. The blast radius of solution 2 feels too high for me, to be really feasible.

Before talking more about the potential solutions, I would like to circle back to understand the use cases for which to use a container with host network mode in the context of Testcontainers. Can you please give some context @rgugliel-da @javahippie?

kiview avatar Aug 25 '22 12:08 kiview

We need that as part of the quite specific pipeline. We use Testcontainer to run an Oracle DB for testing on CI (Oracle enterprise). The reason we use Testcontainer and not directly the Oracle container is because of technicalities of our CI platform. This nesting of containers creates issue with queries issue to the DB except if we use network=nost.

rgugliel-da avatar Aug 26 '22 09:08 rgugliel-da

@rgugliel-da Thanks for sharing the context. So this is for solving an issue when using a containerized CI and you using Docker-Socket-Mounting to start sibling containers?

I think I don't yet understand how the network setup affects querying the database in this case. Did you consider putting the sibling container on the same Docker network as the containerized test JVM and don't use dynamic port mapping at all, but rely on fixed port and Docker DNS?

kiview avatar Aug 26 '22 09:08 kiview

Hey @rgugliel-da, any chance to follow up on my previous question? Or alternatively @javahippie?

kiview avatar Sep 14 '22 12:09 kiview

Hi @kiview , I am very sorry for the delay in my response!

So this is for solving an issue when using a containerized CI and you using Docker-Socket-Mounting to start sibling containers?

That's correct.

Did you consider putting the sibling container on the same Docker network as the containerized test JVM and don't use dynamic port mapping at all, but rely on fixed port and Docker DNS?

No, I did not. I went for the current approach because I thought I could use similar setup both locally and on CI.

rgugliel-da avatar Sep 14 '22 13:09 rgugliel-da

No, I did not. I went for the current approach because I thought I could use similar setup both locally and on CI.

I was thinking about using Testcontainers' Advanced Networking Feature. Is there an in the communication of 2 containers started by Testcontainers with each other, or is it about the JVM process communicating with the Oracle container?

Edit: Added a napkin drawing to visualize the 2 different scenarios 😅 image

kiview avatar Sep 14 '22 13:09 kiview

Situation A. From the build container, I run my JVM (sbt) and from there I start a testcontainer running Oracle

rgugliel-da avatar Sep 15 '22 14:09 rgugliel-da

Thank you, then it seems we indeed need to support the network mode. I still don't understand why the Oracle container requires it in this setup, but since it worked in old Testcontainers versions we introduced a regression for this feature and I believe we should try to restore the fuctionality.

kiview avatar Sep 16 '22 11:09 kiview

@REslim30 Would you like to contribute the fix?

Regarding your 2nd proposal, what if we implement your second proposal, but throw a runtime exception, so we don't have to handle all usages of getMappedPort? (and we avoid a breaking change like this)

kiview avatar Sep 16 '22 14:09 kiview

Sure, give me one sec.

aidando73 avatar Sep 17 '22 00:09 aidando73

@rgugliel-da, @javahippie @pretep22, can I confirm what environments you're on?

The host networking driver only works on Linux hosts, and is not supported on Docker Desktop for Mac, Docker Desktop for Windows, or Docker EE for Windows Server.

Docker docs

I'm considering whether or not to throw an exception (or at least a warning) on non-linux environments when trying to use host mode.

aidando73 avatar Sep 17 '22 02:09 aidando73

Ok I've submitted a PR for this.

As for:

I'm considering whether or not to throw an exception (or at least a warning) on non-linux environments when trying to use host mode.

I've added a separate issue for this. I think there's value in it. https://github.com/testcontainers/testcontainers-java/issues/5853

aidando73 avatar Sep 17 '22 06:09 aidando73

@rgugliel-da, @javahippie @pretep22, can I confirm what environments you're on?

The host networking driver only works on Linux hosts, and is not supported on Docker Desktop for Mac, Docker Desktop for Windows, or Docker EE for Windows Server.

Docker docs

I'm considering whether or not to throw an exception (or at least a warning) on non-linux environments when trying to use host mode.

I use Ubuntu.

rgugliel-da avatar Sep 19 '22 05:09 rgugliel-da

We close this as not planned, see comment in PR: https://github.com/testcontainers/testcontainers-java/pull/5852#issuecomment-1318812304

However, we don't want to support network mode host through Testcontainers, for 2 main reasons:

  1. Network mode host is only available on Linux (as also reflected in your implementation) and test code using it would therefore not be portable.
  2. Containers started in network mode host share the same network namespace, which basically means they will use fixed ports. This is not aligned with the Testcontainers philosophy of avoiding test failures through port conflicts (meaning environment state).

kiview avatar Nov 17 '22 15:11 kiview

I'd like to clarify, network mode host still works as expected if not used in conjunction with withExposedPorts(). This also makes sense, because when using network mode host, there is no need to expose ports through the TC API, since the container is accessible through the host network.

kiview avatar Mar 01 '23 09:03 kiview