testcontainers-node icon indicating copy to clipboard operation
testcontainers-node copied to clipboard

[Feature Request] SSH Docker connection support

Open Frederick888 opened this issue 8 months ago • 9 comments

I currently connect to a remote Docker server to run Testcontainers over plain HTTP (with env var DOCKER_HOST=tcp://docker.lan:2375).

Docker is going to remove this option and users either have to use HTTPS or SSH. Since I only do this within my home network, (properly) manage a TLS certificate chain seems to be an overkill and it doesn't offer much security benefit either as long as I still run Docker in root-ful mode. So SSH is a better option for me.

However DOCKER_HOST=ssh://[email protected] did not work out of the box. Thankfully dockerode already supports SSH and it wasn't too much hassle to get it running:

diff --git a/packages/testcontainers/src/container-runtime/clients/container/docker-container-client.ts b/packages/testcontainers/src/container-runtime/clients/container/docker-container-client.ts
index 900c6af..e3c68f1 100644
--- a/packages/testcontainers/src/container-runtime/clients/container/docker-container-client.ts
+++ b/packages/testcontainers/src/container-runtime/clients/container/docker-container-client.ts
@@ -176,9 +176,11 @@ export class DockerContainerClient implements ContainerClient {
         since: opts?.since ?? 0,
       })
       .then(async (stream) => {
         const actualLogStream = stream as IncomingMessage;
-        actualLogStream.socket?.unref();
+        if (typeof actualLogStream.socket?.unref === "function") {
+          actualLogStream.socket.unref();
+        }
 
         const demuxedStream = await this.demuxStream(container.id, actualLogStream);
         demuxedStream.pipe(proxyStream);
         demuxedStream.on("error", (err) => proxyStream.emit("error", err));
diff --git a/packages/testcontainers/src/container-runtime/strategies/configuration-strategy.ts b/packages/testcontainers/src/container-runtime/strategies/configuration-strategy.ts
index 874a313..2b6f1a0 100644
--- a/packages/testcontainers/src/container-runtime/strategies/configuration-strategy.ts
+++ b/packages/testcontainers/src/container-runtime/strategies/configuration-strategy.ts
@@ -27,12 +27,20 @@ export class ConfigurationStrategy implements ContainerRuntimeClientStrategy {
     this.dockerCertPath = dockerCertPath;
 
     const dockerOptions: DockerOptions = {};
 
-    const { pathname, hostname, port } = new URL(this.dockerHost);
+    const { protocol, pathname, hostname, port } = new URL(this.dockerHost);
     if (hostname !== "") {
+      dockerOptions.protocol = protocol === "ssh:" ? "ssh" : undefined;
       dockerOptions.host = hostname;
       dockerOptions.port = port;
+      if (dockerOptions.protocol === "ssh") {
+        const key = await fs.readFile("/path/to/key");
+        dockerOptions.sshOptions = {
+          privateKey: key,
+          keepaliveInterval: 0,
+        };
+      }
     } else {
       dockerOptions.socketPath = pathname;
     }
 
diff --git a/packages/testcontainers/src/container-runtime/utils/resolve-host.ts b/packages/testcontainers/src/container-runtime/utils/resolve-host.ts
index 2dc18e6..da148b4 100644
--- a/packages/testcontainers/src/container-runtime/utils/resolve-host.ts
+++ b/packages/testcontainers/src/container-runtime/utils/resolve-host.ts
@@ -22,8 +22,9 @@ export const resolveHost = async (
   switch (protocol) {
     case "http:":
     case "https:":
     case "tcp:":
+    case "ssh:":
       return hostname;
     case "unix:":
     case "npipe:": {
       if (isInContainer()) {
diff --git a/packages/testcontainers/src/generic-container/started-generic-container.ts b/packages/testcontainers/src/generic-container/started-generic-container.ts
index 7bb052d..2e4d014 100644
--- a/packages/testcontainers/src/generic-container/started-generic-container.ts
+++ b/packages/testcontainers/src/generic-container/started-generic-container.ts
@@ -116,8 +116,13 @@ export class StartedGenericContainer implements StartedTestContainer {
     if (this.containerIsStopped) {
       await this.containerIsStopped();
     }
 
+    // @ts-expect-error yeah
+    log.info("active res: " + JSON.stringify(process.getActiveResourcesInfo()));
+    // @ts-expect-error yeah
+    log.info("active handles: " + JSON.stringify(process._getActiveHandles()));
+
     return new StoppedGenericContainer(this.container);
   }
 
   public getHost(): string {

However since the ssh2 Channel does not offer an unref() method, now my tests won't exit cleanly due to dangling TCP connections. Then I found https://github.com/mscdex/ssh2/issues/217 and tried patching docker-modem with

diff --git a/lib/ssh.js b/lib/ssh.js
index c4a14f7..025fbbe 100644
--- a/lib/ssh.js
+++ b/lib/ssh.js
@@ -7,8 +7,9 @@ module.exports = function (opt) {
 
   agent.createConnection = function (options, fn) {
     try {
       conn.once('ready', function () {
+        conn._sock.unref()
         conn.exec('docker system dial-stdio', function (err, stream) {
           if (err) {
             handleError(err , fn);
           }

...but this actually broke the setup where it exited even before tests started.

I'm not familiar with the code base here. Should I try closing dockerode somewhere else?

Btw apart from basic SSH support, I'm also interested in offering options to bypass SSH agent, specifying SSH key path, etc., as I'd like to use a different user with a different key file for all Docker connections (my regular user's on GPG agent with a YubiKey).

Frederick888 avatar May 08 '25 07:05 Frederick888

Hi @Frederick888,

Thanks for raising such a detailed issue. SSH support would be great to be added to this library.

I see you've made changes in the right places. In terms of dangling processes on exit, unless unref'ing is supported by the downstream libs (dockerode, ssh2), we may be out of luck, and need to add support there first.

I see from the issue you linked the following was said:

The reason is that by default in ssh2 v0.3.x, there is a ping interval that is set up. You can disable this by setting pingInterval: false in your config options. Or if you want to dig into the internals to disable the timer manually, you can do clearInterval(conn._pinger);.

In ssh2 v0.4.x, the keepalive timer is not on by default so unref()'ing the socket is enough.

Instead of patching conn._sock.unref(), could you try one of the suggestions from that comment? E.g having dockerode pass in pingInterval: false, or patching in clearInterval(conn._pinger);? Regardless if either of these work, they'd need to be proposed downstream to the corresponding lib.

cristianrgreco avatar May 08 '25 08:05 cristianrgreco

Another Q:

now my tests won't exit cleanly due to dangling TCP connections

Do the tests still not exit even if you stop the container? If not then at what point does Dockerode terminate the SSH connection if there is one?

cristianrgreco avatar May 08 '25 08:05 cristianrgreco

having dockerode pass in pingInterval: false

pingInterval was renamed to keepaliveInterval in https://github.com/mscdex/ssh2/commit/992296b4cab5fc84ec48a9886402282965e14c4a, which I already set to 0 in the patch :)

(Actually this was unnecessary as the default value changed from 60,000 to 0 too in the same commit.)

Do the tests still not exit even if you stop the container? If not then at what point does Dockerode terminate the SSH connection if there is one?

Ah, this may lead to somewhere! After the tests finished, I only had the ryuk container left, where I did a docker rm -f <ryuk container> and jest exited immediately.

Edit: My tests can exit cleanly with TESTCONTAINERS_RYUK_DISABLED=true

Frederick888 avatar May 08 '25 08:05 Frederick888

Ah that's interesting.

I'm not aware of any issues related to Ryuk hanging when using Docker via socket or HTTP. Could you confirm that Ryuk only becomes a problem when using SSH?

cristianrgreco avatar May 08 '25 09:05 cristianrgreco

Could you confirm that Ryuk only becomes a problem when using SSH?

Yes. I used testcontainers + ryuk on macOS with Docker Desktop's socket for a while without any issue. Then I migrated to a remote Docker server with tcp://, been on it probably even longer, and never encountered this problem before.

Btw I checked the output of sudo ss -antp on the server before running tests, when it got stuck, and after docker rm -f & test exited. The relevant diff between stuck vs. after was:

--- stuck.txt   2025-05-08 19:11:19.230584514 +1000
+++ after.txt   2025-05-08 19:11:32.221030918 +1000
@@ -65,7 +65,6 @@
-LISTEN    0      4096                0.0.0.0:32974              0.0.0.0:*     users:(("docker-proxy",pid=1167175,fd=7))
-ESTAB     0      0               x.x.x.x.97:22            x.x.x.x.198:57069 users:(("sshd-session",pid=1167229,fd=7),("sshd-session",pid=1167218,fd=7))
-ESTAB     0      0               x.x.x.x.97:22            x.x.x.x.198:57070 users:(("sshd-session",pid=1167230,fd=7),("sshd-session",pid=1167219,fd=7))
-LISTEN    0      4096                   [::]:32974                 [::]:*     users:(("docker-proxy",pid=1167182,fd=7))

32974 was the port that ryuk listened on so still it left us with the SSH connections. I'm not sure how these two things are connected.

PS: I'm probably gonna take off very soon but LMK if you any other info :)

Frederick888 avatar May 08 '25 09:05 Frederick888

That's a shame. It looks like when using SSH, that any container managed by Dockerode will keep the SSH connection alive, and thus hang the Node process.

Would need to find some workaround for Ryuk. Struggling to think of one off the top of my head.

cristianrgreco avatar May 08 '25 09:05 cristianrgreco

Haven't got too much time to dig into this but a small update:

I noticed that this issue only happened when it started a new ryuk container.

I patched reaper.ts with

      .withEnvironment({
        RYUK_CONNECTION_TIMEOUT: "10m",
        RYUK_RECONNECTION_TIMEOUT: "10m",
      })

Then subsequent test executions, as long as the ryuk container was still alive for reuse, all could exit cleanly.

I'm still baffled by the difference between ryuk containers and other containers. I thought maybe it was something along the line of GenericContainer of other containers -> RyukReaper -> GenericContainer of ryuk -> Dockerode, but RyukReaper doesn't seem to hold references GenericContainer in any way whatsoever.

And I managed to put together a small reproducer at https://github.com/Frederick888/playground/tree/fzhang/tc-ssh. Apply my patch in the issue description, build TC-node, then in the reproducer run yarn link --all /path/to/testcontainers-node. Finally run yarn jest with either DOCKER_HOST=tcp://... or DOCKER_HOST=ssh:// to see the results.

Frederick888 avatar May 12 '25 05:05 Frederick888

Admittedly it's quite hacky but this seems to do the trick:

diff --git a/packages/testcontainers/src/container-runtime/clients/container/docker-container-client.ts b/packages/testcontainers/src/container-runtime/clients/container/docker-container-client.ts
index 900c6af..97f52d7 100644
--- a/packages/testcontainers/src/container-runtime/clients/container/docker-container-client.ts
+++ b/packages/testcontainers/src/container-runtime/clients/container/docker-container-client.ts
@@ -176,9 +176,16 @@ export class DockerContainerClient implements ContainerClient {
         since: opts?.since ?? 0,
       })
       .then(async (stream) => {
         const actualLogStream = stream as IncomingMessage;
-        actualLogStream.socket?.unref();
+        if (typeof actualLogStream.socket?.unref === "function") {
+          actualLogStream.socket?.unref();
+        }
+        // @ts-expect-error ssh2 internals
+        if (typeof actualLogStream.socket?._client?._sock?.unref === "function") {
+          // @ts-expect-error ssh2 internals
+          actualLogStream.socket?._client?._sock?.unref();
+        }
 
         const demuxedStream = await this.demuxStream(container.id, actualLogStream);
         demuxedStream.pipe(proxyStream);
         demuxedStream.on("error", (err) => proxyStream.emit("error", err));

didn't need to patch any dependencies.

Frederick888 avatar May 15 '25 05:05 Frederick888

Great find, the issue raised to dockerode also seems good.

actualLogStream.socket?._client?._sock?.unref();

Seems the fix should be made downstream, and then things should just work.

cristianrgreco avatar May 15 '25 09:05 cristianrgreco