Kafka Dev Service Loading on Different Docker Network
Describe the bug
When launching a @QuarkusIntegrationTest which uses the built-in Kafka Dev Service, it is launched within a different Docker Network, making the Kafka (or Redpanda by default) unreachable from inside your application.
Expected behavior
The Kafka is reachable from inside the application/other containers running inside the shared network.
Actual behavior
The Kafka is launched from inside a different Docker network.
This can be observed when I inspect the containers: quarkus-integration-test pod:
"NetworkMode": "quarkus-integration-test-ceSNx"
Kafka (redpanda) pod:
"NetworkMode": "b39ca549e1f11fde7240290f3f7add0729ba6f68791ba7a4cbc461c8a09c5ccf"
How to Reproduce?
I've created a GitHub project which is able to reproduce the problem. The config is fairly minimal, but all is basically does it a very simple integration test, with a rest endpoint to validate that the application has successfully connected to Kafka.
https://github.com/rubik-cube-man/kafka-integration-test
Running on 3.24.5 allows this test to pass, but all versions beyond that fail.
Output of uname -a or ver
Windows 11
Output of java -version
openjdk version "21.0.4" 2024-07-16 LTS
Quarkus version or git rev
Reproducable on 3.25.0+ (tested up to 3.26.3)
Build tool (ie. output of mvnw --version or gradlew --version)
Gradle 8.14
Additional information
I've spent a large time trying to debug this, but it seems that it's just down to the ordering of what happens.
From my understanding of why this is happening, it is because the network id that is being populated, is now being done in a different order.
The network id from here is loaded from getSharedNetworkId(). This seems to only get the network id from Network.SHARED, if it's already been populated by something else.
It appears that Kafka also uses Network.SHARED, however prior to 3.25.0, it seemed to load before the DevServicesNetworkIdBuildItem was populated. This constructed Network.SHARED. From 3.25.0 onwards, it seems that the DevServicesNetworkIdBuildItem is created before the Kafka DevServices are started, and hence Network.SHARED is not populated. This results in the DevServicesNetworkIdBuildItem being constructed with a null network id. This results in the application falling back to a network id randomly generated here.
/cc @alesj (kafka), @cescoffier (kafka), @geoand (devservices), @holly-cummins (devservices), @ozangunalp (devservices,kafka)
Thanks for the detailed analysis! It always makes the fix easier. It looks like this is a side effect of https://github.com/quarkusio/quarkus/issues/47627. Hopefully we have enough information at the build step stage to emit a corrected build item. Hopefully.
That's odd. I need to investigate it. I've assigned this to myself.
I suspect that this is a windows issue.
I've run tests using
./gradlew clean test integrationTest quarkusIntTest --no-build-cache --info
and couldn't reproduce it.
I suspect that this is a windows issue. I've run tests using
./gradlew clean test integrationTest quarkusIntTest --no-build-cache --infoand couldn't reproduce it.
Thanks for having a look into this issue!
Using your command I was able to get a passing result as well, but that looks like because it's missing the image build argument, so it looks like it's skipping the integration tests. This is the command I started running locally to get it to start failing again locally:
./gradlew clean test integrationTest quarkusIntTest --no-build-cache --info -Dquarkus.container-image.build=true
Definitely, we need to run the integration test against the app running in container.
I've identified the problem. Let me explain:
With the Dev Services Lifecycle effort, we've made some changes to how containers are configured, notably for the network, to help writing new dev services. Because Quarkus is handling different classloaders during build and test (ex. https://github.com/orgs/quarkusio/projects/30), We've been very defensive about not creating multiple SHARED testcontainers networks using test containers.
The combination of the two made it so that, in some cases, the SHARED network is created too late, so the integration test is not aware of it.
I am a bit surprised we didn't catch it in an integration test.
I think I've got the fix for it. I'll make sure this is covered by an IT.
FYI @holly-cummins
Hi @ozangunalp any news about the fix? Stumbled upon the same issue this week while migrating our own devservice.
Are there some workarounds or any way I could help?
I am having similar problem not only with QuarkusIntegrationTest, but also with regular @QuarkusTest which uses devservices. Desvservices are created multiple times under different docker networks. Testcontainers reuse flag doesnt seem to have any effect on it