quarkus icon indicating copy to clipboard operation
quarkus copied to clipboard

Kafka Dev Service Loading on Different Docker Network

Open rubik-cube-man opened this issue 3 months ago • 7 comments

Describe the bug

When launching a @QuarkusIntegrationTest which uses the built-in Kafka Dev Service, it is launched within a different Docker Network, making the Kafka (or Redpanda by default) unreachable from inside your application.

Expected behavior

The Kafka is reachable from inside the application/other containers running inside the shared network.

Actual behavior

The Kafka is launched from inside a different Docker network.

This can be observed when I inspect the containers: quarkus-integration-test pod:

"NetworkMode": "quarkus-integration-test-ceSNx"

Kafka (redpanda) pod:

"NetworkMode": "b39ca549e1f11fde7240290f3f7add0729ba6f68791ba7a4cbc461c8a09c5ccf"

How to Reproduce?

I've created a GitHub project which is able to reproduce the problem. The config is fairly minimal, but all is basically does it a very simple integration test, with a rest endpoint to validate that the application has successfully connected to Kafka.

https://github.com/rubik-cube-man/kafka-integration-test

Running on 3.24.5 allows this test to pass, but all versions beyond that fail.

Output of uname -a or ver

Windows 11

Output of java -version

openjdk version "21.0.4" 2024-07-16 LTS

Quarkus version or git rev

Reproducable on 3.25.0+ (tested up to 3.26.3)

Build tool (ie. output of mvnw --version or gradlew --version)

Gradle 8.14

Additional information

I've spent a large time trying to debug this, but it seems that it's just down to the ordering of what happens.

From my understanding of why this is happening, it is because the network id that is being populated, is now being done in a different order.

The network id from here is loaded from getSharedNetworkId(). This seems to only get the network id from Network.SHARED, if it's already been populated by something else.

It appears that Kafka also uses Network.SHARED, however prior to 3.25.0, it seemed to load before the DevServicesNetworkIdBuildItem was populated. This constructed Network.SHARED. From 3.25.0 onwards, it seems that the DevServicesNetworkIdBuildItem is created before the Kafka DevServices are started, and hence Network.SHARED is not populated. This results in the DevServicesNetworkIdBuildItem being constructed with a null network id. This results in the application falling back to a network id randomly generated here.

rubik-cube-man avatar Sep 16 '25 15:09 rubik-cube-man

/cc @alesj (kafka), @cescoffier (kafka), @geoand (devservices), @holly-cummins (devservices), @ozangunalp (devservices,kafka)

quarkus-bot[bot] avatar Sep 16 '25 15:09 quarkus-bot[bot]

Thanks for the detailed analysis! It always makes the fix easier. It looks like this is a side effect of https://github.com/quarkusio/quarkus/issues/47627. Hopefully we have enough information at the build step stage to emit a corrected build item. Hopefully.

holly-cummins avatar Sep 16 '25 15:09 holly-cummins

That's odd. I need to investigate it. I've assigned this to myself.

ozangunalp avatar Sep 16 '25 16:09 ozangunalp

I suspect that this is a windows issue. I've run tests using ./gradlew clean test integrationTest quarkusIntTest --no-build-cache --info

and couldn't reproduce it.

ozangunalp avatar Sep 17 '25 07:09 ozangunalp

I suspect that this is a windows issue. I've run tests using ./gradlew clean test integrationTest quarkusIntTest --no-build-cache --info

and couldn't reproduce it.

Thanks for having a look into this issue!

Using your command I was able to get a passing result as well, but that looks like because it's missing the image build argument, so it looks like it's skipping the integration tests. This is the command I started running locally to get it to start failing again locally:

./gradlew clean test integrationTest quarkusIntTest --no-build-cache --info -Dquarkus.container-image.build=true

rubik-cube-man avatar Sep 17 '25 09:09 rubik-cube-man

Definitely, we need to run the integration test against the app running in container.

I've identified the problem. Let me explain:

With the Dev Services Lifecycle effort, we've made some changes to how containers are configured, notably for the network, to help writing new dev services. Because Quarkus is handling different classloaders during build and test (ex. https://github.com/orgs/quarkusio/projects/30), We've been very defensive about not creating multiple SHARED testcontainers networks using test containers.

The combination of the two made it so that, in some cases, the SHARED network is created too late, so the integration test is not aware of it.

I am a bit surprised we didn't catch it in an integration test.

I think I've got the fix for it. I'll make sure this is covered by an IT.

FYI @holly-cummins

ozangunalp avatar Sep 17 '25 14:09 ozangunalp

Hi @ozangunalp any news about the fix? Stumbled upon the same issue this week while migrating our own devservice.

Are there some workarounds or any way I could help?

janscheidegger avatar Dec 05 '25 07:12 janscheidegger

I am having similar problem not only with QuarkusIntegrationTest, but also with regular @QuarkusTest which uses devservices. Desvservices are created multiple times under different docker networks. Testcontainers reuse flag doesnt seem to have any effect on it

bkalas avatar Dec 16 '25 09:12 bkalas