Test http4s servers fail release bound port on shutdown when using fs2 3.12.0
This issue has been minorly discussed in discord in the cats-effect channel, but I have recently concluded that the most fruitful avenue for exploration may be in fs2 rather than cats-effect, although my suspicion is that the bug might be somewhere straddling the two libraries.
The general idea is that we have a number of integration tests that spin up http4s servers so the tests can run requests against them. The servers are run within Resource so we can guarantee that they shut down cleanly when the tests are finished. We are careful to not run tests in parallel and we've been setting the fork property so tests run in a forked JVM.
When cats-effect 3.6.0 came out, we upgraded cats-effect and fs2 and suddenly a number of our tests began failing with errors like:
==> X com.example.ce361.ReproSpec.shutdown 2 2.168s java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:565)
at sun.nio.ch.ServerSocketChannelImpl.netBind(ServerSocketChannelImpl.java:344)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:301)
at java.nio.channels.ServerSocketChannel.bind(ServerSocketChannel.java:224)
at fs2.io.net.SelectingSocketGroup.$anonfun$serverResource$7(SelectingSocketGroup.scala:112)
at delay @ fs2.io.net.SelectingSocketGroup.$anonfun$serverResource$6(SelectingSocketGroup.scala:107)
at traverse @ fs2.io.net.SelectingSocketGroup.$anonfun$serverResource$4(SelectingSocketGroup.scala:106)
at flatMap @ fs2.io.net.SelectingSocketGroup.$anonfun$serverResource$4(SelectingSocketGroup.scala:106)
at *>$extension @ org.typelevel.keypool.internal.RequestSemaphore$.apply(RequestSemaphore.scala:79)
Our interpretation of the failure is that at the end of the resource scope, when the http4s servers should have shut down, they have not unbound their ports in the OS, so when a new test starts up and tries to start a new server (on the same port, because we just use a fixed port), it cannot bind to the test port and fails. As a result, we have been stuck back on cats-effect 3.5.7 and fs2 3.11.0 since early this year, but it's becoming increasingly untenable to be in this situation.
I have made a repro case in this repository, mirroring a common basic setup for our tests:
https://github.com/mtomko/ce361
Note that if you use fs2 3.11.0, it seems to work fine, but when you bump to 3.12.0, it fails.
I'm happy to do some legwork here but I think without a bit of guidance I might be hard pressed to make any progress. I was planning to begin by looking at the polling change introduced in 3.12.0 but I'm not familiar with the internals of either fs2 or cats-effect so I have a pretty steep learning curve.
I am not an expert in these APIs but I noticed this code in SelectingSocketGroup.scala, in the definition of serverResource:
Resource
.make(F.delay(selector.provider.openServerSocketChannel())) { ch =>
F.delay(ch.close())
}
.evalMap { serverCh =>
val configure = address.traverse(_.resolve).flatMap { ip =>
F.delay {
serverCh.configureBlocking(false)
serverCh.bind(
new InetSocketAddress(
ip.map(_.toInetAddress).orNull,
port.map(_.value).getOrElse(0)
)
)
}
}
I'm curious because serverCh.bind is non-void - it returns something that can be closed, but it doesn't seem to be closed anywhere explicitly. It's possible that closing the channel created in the resource above will unbind serverCh too but what if it doesn't?
I tested this and it doesn't seem to pan out. Looking at the Javadocs, I finally found that bind returns the socket it was called on to allow for chaining, so if the channel is closed at all (which it should be since it's a resource) the socket should be unbound.
Also, including the output of sbt evicted for the fs2 3.11.0 case:
[info] Here are other dependency conflicts that were resolved:
[info] * org.scala-lang:scala-library:2.13.16 is selected over {2.13.14, 2.13.15, 2.13.8, 2.13.5, 2.13.12, 2.13.11}
[info] +- com.example:ce361_2.13:0.0.1-SNAPSHOT (depends on 2.13.16)
[info] +- io.circe:circe-numbers_2.13:0.14.13 (depends on 2.13.16)
[info] +- org.typelevel:cats-kernel_2.13:2.13.0 (depends on 2.13.16)
[info] +- org.typelevel:cats-core_2.13:2.13.0 (depends on 2.13.16)
[info] +- io.circe:circe-generic_2.13:0.14.13 (depends on 2.13.16)
[info] +- io.circe:circe-core_2.13:0.14.13 (depends on 2.13.16)
[info] +- com.chuusai:shapeless_2.13:2.3.13 (depends on 2.13.16)
[info] +- org.scala-lang:scala-reflect:2.13.16 (depends on 2.13.16)
[info] +- org.typelevel:vault_2.13:3.6.0 (depends on 2.13.14)
[info] +- org.typelevel:log4cats-slf4j_2.13:2.7.0 (depends on 2.13.14)
[info] +- org.typelevel:log4cats-core_2.13:2.7.0 (depends on 2.13.14)
[info] +- org.typelevel:literally_2.13:1.1.0 (depends on 2.13.8)
[info] +- org.typelevel:keypool_2.13:0.4.10 (depends on 2.13.14)
[info] +- org.typelevel:jawn-parser_2.13:1.5.1 (depends on 2.13.11)
[info] +- org.typelevel:jawn-fs2_2.13:2.4.0 (depends on 2.13.8)
[info] +- org.typelevel:cats-parse_2.13:1.0.0 (depends on 2.13.12)
[info] +- org.typelevel:cats-mtl_2.13:1.3.1 (depends on 2.13.8)
[info] +- org.typelevel:cats-effect_2.13:3.6.3 (depends on 2.13.15)
[info] +- org.typelevel:cats-effect-std_2.13:3.6.3 (depends on 2.13.15)
[info] +- org.typelevel:cats-effect-kernel_2.13:3.6.3 (depends on 2.13.15)
[info] +- org.typelevel:case-insensitive_2.13:1.4.2 (depends on 2.13.14)
[info] +- org.scodec:scodec-bits_2.13:1.1.38 (depends on 2.13.12)
[info] +- org.log4s:log4s_2.13:1.10.0 (depends on 2.13.5)
[info] +- org.http4s:http4s-server_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-jawn_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-ember-server_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-ember-core_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-ember-client_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-dsl_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-crypto_2.13:0.2.4 (depends on 2.13.8)
[info] +- org.http4s:http4s-core_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-client_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-circe_2.13:0.23.30 (depends on 2.13.15)
[info] +- io.circe:circe-jawn_2.13:0.14.8 (depends on 2.13.14)
[info] +- com.comcast:ip4s-core_2.13:3.6.0 (depends on 2.13.14)
[info] +- co.fs2:fs2-io_2.13:3.11.0 (depends on 2.13.14)
[info] +- co.fs2:fs2-core_2.13:3.11.0 (depends on 2.13.14)
[info] * co.fs2:fs2-core_2.13:3.11.0 is selected over 3.3.0
[info] +- com.example:ce361_2.13:0.0.1-SNAPSHOT (depends on 3.11.0)
[info] +- org.http4s:http4s-core_2.13:0.23.30 (depends on 3.11.0)
[info] +- co.fs2:fs2-io_2.13:3.11.0 (depends on 3.11.0)
[info] +- org.typelevel:jawn-fs2_2.13:2.4.0 (depends on 3.3.0)
[info] * io.circe:circe-core_2.13:0.14.13 is selected over {0.14.8}
[info] +- io.circe:circe-generic_2.13:0.14.13 (depends on 0.14.13)
[info] +- org.http4s:http4s-circe_2.13:0.23.30 (depends on 0.14.8)
[info] +- io.circe:circe-jawn_2.13:0.14.8 (depends on 0.14.8)
[info] * org.typelevel:cats-effect_2.13:3.6.3 is selected over {3.5.4}
[info] +- com.example:ce361_2.13:0.0.1-SNAPSHOT (depends on 3.6.3)
[info] +- org.typelevel:vault_2.13:3.6.0 (depends on 3.5.4)
[info] +- org.typelevel:log4cats-slf4j_2.13:2.7.0 (depends on 3.5.4)
[info] +- com.comcast:ip4s-core_2.13:3.6.0 (depends on 3.5.4)
[info] +- co.fs2:fs2-core_2.13:3.11.0 (depends on 3.5.4)
[info] * org.typelevel:cats-effect-std_2.13:3.6.3 is selected over {3.5.7, 3.5.4}
[info] +- org.typelevel:cats-effect_2.13:3.6.3 (depends on 3.6.3)
[info] +- org.typelevel:log4cats-core_2.13:2.7.0 (depends on 3.5.4)
[info] +- org.typelevel:keypool_2.13:0.4.10 (depends on 3.5.4)
[info] +- org.http4s:http4s-core_2.13:0.23.30 (depends on 3.5.7)
[info] * org.scala-lang:scala-reflect:2.13.16 is selected over 2.13.12
[info] +- org.typelevel:cats-parse_2.13:1.0.0 (depends on 2.13.12)
[info] * org.scodec:scodec-bits_2.13:1.1.38 is selected over 1.1.34
[info] +- org.http4s:http4s-core_2.13:0.23.30 (depends on 1.1.38)
[info] +- co.fs2:fs2-core_2.13:3.11.0 (depends on 1.1.38)
[info] +- org.http4s:http4s-crypto_2.13:0.2.4 (depends on 1.1.34)
[info] * org.typelevel:jawn-parser_2.13:1.5.1 is selected over 1.4.0
[info] +- io.circe:circe-jawn_2.13:0.14.8 (depends on 1.5.1)
[info] +- org.http4s:http4s-jawn_2.13:0.23.30 (depends on 1.5.1)
[info] +- org.typelevel:jawn-fs2_2.13:2.4.0 (depends on 1.4.0)
[info] * org.typelevel:cats-core_2.13:2.13.0 is selected over {2.11.0, 2.10.0, 2.8.0}
[info] +- io.circe:circe-core_2.13:0.14.13 (depends on 2.13.0)
[info] +- org.typelevel:vault_2.13:3.6.0 (depends on 2.11.0)
[info] +- org.typelevel:log4cats-core_2.13:2.7.0 (depends on 2.10.0)
[info] +- org.typelevel:keypool_2.13:0.4.10 (depends on 2.11.0)
[info] +- org.typelevel:cats-mtl_2.13:1.3.1 (depends on 2.8.0)
[info] +- org.typelevel:cats-effect-kernel_2.13:3.6.3 (depends on 2.11.0)
[info] +- org.typelevel:case-insensitive_2.13:1.4.2 (depends on 2.11.0)
[info] +- org.http4s:http4s-crypto_2.13:0.2.4 (depends on 2.8.0)
[info] +- org.http4s:http4s-core_2.13:0.23.30 (depends on 2.11.0)
[info] +- com.comcast:ip4s-core_2.13:3.6.0 (depends on 2.10.0)
[info] +- co.fs2:fs2-core_2.13:3.11.0 (depends on 2.11.0)
[info] * org.typelevel:cats-effect-kernel_2.13:3.6.3 is selected over 3.3.14
[info] +- org.typelevel:cats-effect-std_2.13:3.6.3 (depends on 3.6.3)
[info] +- org.typelevel:cats-effect_2.13:3.6.3 (depends on 3.6.3)
[info] +- org.http4s:http4s-crypto_2.13:0.2.4 (depends on 3.3.14)
[info] * org.slf4j:slf4j-api:1.7.36 is selected over 1.7.30
[info] +- org.http4s:http4s-core_2.13:0.23.30 (depends on 1.7.36)
[info] +- org.typelevel:log4cats-slf4j_2.13:2.7.0 (depends on 1.7.36)
[info] +- org.log4s:log4s_2.13:1.10.0 (depends on 1.7.30)
and here it is for 3.12.0:
[info] Here are other dependency conflicts that were resolved:
[info] * org.scala-lang:scala-library:2.13.16 is selected over {2.13.14, 2.13.15, 2.13.8, 2.13.5, 2.13.12, 2.13.11}
[info] +- com.example:ce361_2.13:0.0.1-SNAPSHOT (depends on 2.13.16)
[info] +- io.circe:circe-numbers_2.13:0.14.13 (depends on 2.13.16)
[info] +- org.typelevel:cats-kernel_2.13:2.13.0 (depends on 2.13.16)
[info] +- co.fs2:fs2-io_2.13:3.12.0 (depends on 2.13.16)
[info] +- org.typelevel:cats-core_2.13:2.13.0 (depends on 2.13.16)
[info] +- io.circe:circe-generic_2.13:0.14.13 (depends on 2.13.16)
[info] +- co.fs2:fs2-core_2.13:3.12.0 (depends on 2.13.16)
[info] +- io.circe:circe-core_2.13:0.14.13 (depends on 2.13.16)
[info] +- com.chuusai:shapeless_2.13:2.3.13 (depends on 2.13.16)
[info] +- org.scala-lang:scala-reflect:2.13.16 (depends on 2.13.16)
[info] +- org.typelevel:vault_2.13:3.6.0 (depends on 2.13.14)
[info] +- org.typelevel:log4cats-slf4j_2.13:2.7.0 (depends on 2.13.14)
[info] +- org.typelevel:log4cats-core_2.13:2.7.0 (depends on 2.13.14)
[info] +- org.typelevel:literally_2.13:1.1.0 (depends on 2.13.8)
[info] +- org.typelevel:keypool_2.13:0.4.10 (depends on 2.13.14)
[info] +- org.typelevel:jawn-parser_2.13:1.5.1 (depends on 2.13.11)
[info] +- org.typelevel:jawn-fs2_2.13:2.4.0 (depends on 2.13.8)
[info] +- org.typelevel:cats-parse_2.13:1.0.0 (depends on 2.13.12)
[info] +- org.typelevel:cats-mtl_2.13:1.3.1 (depends on 2.13.8)
[info] +- org.typelevel:cats-effect_2.13:3.6.3 (depends on 2.13.15)
[info] +- org.typelevel:cats-effect-std_2.13:3.6.3 (depends on 2.13.15)
[info] +- org.typelevel:cats-effect-kernel_2.13:3.6.3 (depends on 2.13.15)
[info] +- org.typelevel:case-insensitive_2.13:1.4.2 (depends on 2.13.14)
[info] +- org.scodec:scodec-bits_2.13:1.1.38 (depends on 2.13.12)
[info] +- org.log4s:log4s_2.13:1.10.0 (depends on 2.13.5)
[info] +- org.http4s:http4s-server_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-jawn_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-ember-server_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-ember-core_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-ember-client_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-dsl_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-crypto_2.13:0.2.4 (depends on 2.13.8)
[info] +- org.http4s:http4s-core_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-client_2.13:0.23.30 (depends on 2.13.15)
[info] +- org.http4s:http4s-circe_2.13:0.23.30 (depends on 2.13.15)
[info] +- io.circe:circe-jawn_2.13:0.14.8 (depends on 2.13.14)
[info] +- com.comcast:ip4s-core_2.13:3.6.0 (depends on 2.13.14)
[info] * co.fs2:fs2-core_2.13:3.12.0 is selected over {3.11.0, 3.3.0}
[info] +- com.example:ce361_2.13:0.0.1-SNAPSHOT (depends on 3.12.0)
[info] +- co.fs2:fs2-io_2.13:3.12.0 (depends on 3.12.0)
[info] +- org.typelevel:jawn-fs2_2.13:2.4.0 (depends on 3.3.0)
[info] +- org.http4s:http4s-core_2.13:0.23.30 (depends on 3.11.0)
[info] * io.circe:circe-core_2.13:0.14.13 is selected over {0.14.8}
[info] +- io.circe:circe-generic_2.13:0.14.13 (depends on 0.14.13)
[info] +- org.http4s:http4s-circe_2.13:0.23.30 (depends on 0.14.8)
[info] +- io.circe:circe-jawn_2.13:0.14.8 (depends on 0.14.8)
[info] * org.typelevel:cats-effect_2.13:3.6.3 is selected over {3.6.0, 3.5.4}
[info] +- com.example:ce361_2.13:0.0.1-SNAPSHOT (depends on 3.6.3)
[info] +- org.typelevel:vault_2.13:3.6.0 (depends on 3.5.4)
[info] +- org.typelevel:log4cats-slf4j_2.13:2.7.0 (depends on 3.5.4)
[info] +- com.comcast:ip4s-core_2.13:3.6.0 (depends on 3.5.4)
[info] +- co.fs2:fs2-core_2.13:3.12.0 (depends on 3.6.0)
[info] * org.typelevel:cats-effect-std_2.13:3.6.3 is selected over {3.5.7, 3.5.4}
[info] +- org.typelevel:cats-effect_2.13:3.6.3 (depends on 3.6.3)
[info] +- org.typelevel:log4cats-core_2.13:2.7.0 (depends on 3.5.4)
[info] +- org.typelevel:keypool_2.13:0.4.10 (depends on 3.5.4)
[info] +- org.http4s:http4s-core_2.13:0.23.30 (depends on 3.5.7)
[info] * org.scala-lang:scala-reflect:2.13.16 is selected over 2.13.12
[info] +- org.typelevel:cats-parse_2.13:1.0.0 (depends on 2.13.12)
[info] * org.scodec:scodec-bits_2.13:1.1.38 is selected over 1.1.34
[info] +- org.http4s:http4s-core_2.13:0.23.30 (depends on 1.1.38)
[info] +- co.fs2:fs2-core_2.13:3.12.0 (depends on 1.1.38)
[info] +- org.http4s:http4s-crypto_2.13:0.2.4 (depends on 1.1.34)
[info] * org.typelevel:jawn-parser_2.13:1.5.1 is selected over 1.4.0
[info] +- io.circe:circe-jawn_2.13:0.14.8 (depends on 1.5.1)
[info] +- org.http4s:http4s-jawn_2.13:0.23.30 (depends on 1.5.1)
[info] +- org.typelevel:jawn-fs2_2.13:2.4.0 (depends on 1.4.0)
[info] * org.typelevel:cats-core_2.13:2.13.0 is selected over {2.11.0, 2.10.0, 2.8.0}
[info] +- io.circe:circe-core_2.13:0.14.13 (depends on 2.13.0)
[info] +- org.typelevel:vault_2.13:3.6.0 (depends on 2.11.0)
[info] +- org.typelevel:log4cats-core_2.13:2.7.0 (depends on 2.10.0)
[info] +- org.typelevel:keypool_2.13:0.4.10 (depends on 2.11.0)
[info] +- org.typelevel:cats-mtl_2.13:1.3.1 (depends on 2.8.0)
[info] +- org.typelevel:cats-effect-kernel_2.13:3.6.3 (depends on 2.11.0)
[info] +- org.typelevel:case-insensitive_2.13:1.4.2 (depends on 2.11.0)
[info] +- org.http4s:http4s-crypto_2.13:0.2.4 (depends on 2.8.0)
[info] +- org.http4s:http4s-core_2.13:0.23.30 (depends on 2.11.0)
[info] +- com.comcast:ip4s-core_2.13:3.6.0 (depends on 2.10.0)
[info] +- co.fs2:fs2-core_2.13:3.12.0 (depends on 2.11.0)
[info] * co.fs2:fs2-io_2.13:3.12.0 is selected over 3.11.0
[info] +- com.example:ce361_2.13:0.0.1-SNAPSHOT (depends on 3.12.0)
[info] +- org.http4s:http4s-core_2.13:0.23.30 (depends on 3.11.0)
[info] * org.typelevel:cats-effect-kernel_2.13:3.6.3 is selected over 3.3.14
[info] +- org.typelevel:cats-effect-std_2.13:3.6.3 (depends on 3.6.3)
[info] +- org.typelevel:cats-effect_2.13:3.6.3 (depends on 3.6.3)
[info] +- org.http4s:http4s-crypto_2.13:0.2.4 (depends on 3.3.14)
[info] * org.slf4j:slf4j-api:1.7.36 is selected over 1.7.30
[info] +- org.http4s:http4s-core_2.13:0.23.30 (depends on 1.7.36)
[info] +- org.typelevel:log4cats-slf4j_2.13:2.7.0 (depends on 1.7.36)
[info] +- org.log4s:log4s_2.13:1.10.0 (depends on 1.7.30)
Still digging on this, and ran into:
https://bugs.openjdk.org/browse/JDK-5073504
There are no JavaDocs for ServerSocketChannel.close() -- the reader must default to the Channel.close() docs, which are not specific about the function of the close() method on a non-blocking ServerSocketChannel and the releasing of OS resources. Specifically, after closing a non-blocking ServerSocketChannel that has been registered with a Selector, the isOpen() method will return false. There is no reason for the developer to believe that the close operation is not complete at this point. However, the port has not been freed to the OS...and a subsequent open()/bind() to the same port number will fail. Even if you have de-registered the channel with the selector via SelectionKey.cancel(), the resource is still not freed. You must make another call to Selector.select() in order for the resources to be freed. In our case, our application knows that there are no other resources registered to the Selector and so there was no reason (or so we thought) to make another call to the Selector.
It seems like this might be it. I checked out fs2 and modified the close logic slightly to wait until the channel no longer thinks it's registered before completing. It now passes my repro test case (with a slightly longer timeout, but I think that's acceptable for a test that opens and closes 1000 servers in a row).
Here's the new logic:
.make(F.delay(selector.provider.openServerSocketChannel())) { ch =>
def wait: F[Unit] = F.delay(ch.isRegistered()).ifM(F.sleep(2.millis) >> wait, F.unit)
F.delay(ch.close()) >> wait
}