Bug Report: payara micro hzinitialjoinwait option has no effect/FISH-7372
Brief Summary
Setting the hazelcast.wait.seconds.before.join property has no effect (either with --hzinitialjoinwait X or -Dhazelcast.wait.seconds.before.join=X)
The value eventually used by hazelcast is always 1.
Expected Outcome
The value used by hazelcast should be the one provided on the command line.
Current Outcome
Hazelcast always use a hardcoded value of 1
Reproducer
Reproduce
Start a single payara-micro instance. Set clusterMode to dns and set hzinitialjoinwait to 30
java -jar ./payara-micro-6.2023.4.jar --clusterMode dns:localhost:6900 --hzinitialjoinwait 30
When the instance tries to join a hazelcast cluster, it should try for 30 seconds before creating a new one. This is not the case: it just tries for 1 second.
In hazelcast, the piece of code that is waiting is DiscoveryJoiner lines 66-82
The 30 seconds delay should be visible in the logs:
[...]
[2023-04-21T10:16:53.399+0200] [] [INFO] [] [com.hazelcast.system.logo] [tid: _ThreadID=25 _ThreadName=RunLevelControllerThread-1682065012770] [timeMillis: 1682065013399] [levelValue: 800] [[
[fe80:0:0:0:9829:46ff:fef2:3daa]:6900 [development] [5.2.2]
+ + o o o o---o o----o o o---o o o----o o--o--o
+ + + + | | / \ / | | / / \ | |
+ + + + + o----o o o o o----o | o o o o----o |
+ + + + | | / \ / | | \ / \ | |
+ + o o o o o---o o----o o----o o---o o o o----o o ]]
[...]
[2023-04-21T10:16:53.909+0200] [] [INFO] [] [com.hazelcast.core.LifecycleService] [tid: _ThreadID=25 _ThreadName=RunLevelControllerThread-1682065012770] [timeMillis: 1682065013909] [levelValue: 800] [fe80:0:0:0:9829:46ff:fef2:3daa]:6900 [development] [5.2.2] [fe80:0:0:0:9829:46ff:fef2:3daa]:6900 is STARTING
[ !!! expect a 30 seconds delay before next log statement !!! ]
[2023-04-21T10:16:55.058+0200] [] [INFO] [] [com.hazelcast.internal.cluster.ClusterService] [tid: _ThreadID=25 _ThreadName=RunLevelControllerThread-1682065012770] [timeMillis: 1682065015058] [levelValue: 800] [[
[fe80:0:0:0:9829:46ff:fef2:3daa]:6900 [development] [5.2.2]
Members {size:1, ver:1} [
Member [fe80:0:0:0:9829:46ff:fef2:3daa]:6900 - e97b06d8-9ffb-411f-bc6a-aed91ece2f41 this
]
]]
[...]
Details
Payara micro bootstrap
First, fish.payara.micro.impl.PayaraMicroImpl#configureHazelcast ensures the system property hazelcast.wait.seconds.before.join is set.
If this is not the case, the property is initialized according to the hzinitialjoinwait option. If no option was provided, property is set to "1".
This step looks fine.
Hazelcast bootstrap
Later, fish.payara.nucleus.hazelcast.HazelcastCore#buildConfiguration creates the hazelcast config object.
On line 394 WAIT_SECONDS_BEFORE_JOIN property is set to "1", hiding the previously set system property.
Hot fix test
Start and debug a single payara micro instance:
java -Xdebug -Xrunjdwp:transport=dt_socket,address=9009,server=y,suspend=y -jar ./payara-micro-6.2023.4.jar --clusterMode dns:localhost:6900 --hzinitialjoinwait 30
Break on HazelcastCore:396
Revert the previous statement by executing config.getProperties().remove(WAIT_SECONDS_BEFORE_JOIN.getName())
Continue code execution : instance will try to join an existing cluster for 30 seconds.
[...]
[2023-04-21T10:18:34.766+0200] [] [INFO] [] [com.hazelcast.system.logo] [tid: _ThreadID=26 _ThreadName=RunLevelControllerThread-1682065109743] [timeMillis: 1682065114766] [levelValue: 800] [[
[fe80:0:0:0:9829:46ff:fef2:3daa]:6900 [development] [5.2.2]
+ + o o o o---o o----o o o---o o o----o o--o--o
+ + + + | | / \ / | | / / \ | |
+ + + + + o----o o o o o----o | o o o o----o |
+ + + + | | / \ / | | \ / \ | |
+ + o o o o o---o o----o o----o o---o o o o----o o ]]
[...]
[2023-04-21T10:18:35.301+0200] [] [INFO] [] [com.hazelcast.core.LifecycleService] [tid: _ThreadID=26 _ThreadName=RunLevelControllerThread-1682065109743] [timeMillis: 1682065115301] [levelValue: 800] [fe80:0:0:0:9829:46ff:fef2:3daa]:6900 [development] [5.2.2] [fe80:0:0:0:9829:46ff:fef2:3daa]:6900 is STARTING
[ ! 30 seconds delay is here ! ]
[2023-04-21T10:19:05.474+0200] [] [INFO] [] [com.hazelcast.internal.cluster.ClusterService] [tid: _ThreadID=26 _ThreadName=RunLevelControllerThread-1682065109743] [timeMillis: 1682065145474] [levelValue: 800] [[
[fe80:0:0:0:9829:46ff:fef2:3daa]:6900 [development] [5.2.2]
Members {size:1, ver:1} [
Member [fe80:0:0:0:9829:46ff:fef2:3daa]:6900 - c48b0915-c77a-466a-90d8-77f723c44353 this
]
]]
Solution
It seems the reason the line in HazelcastCore exists is to prevent setting the property to "0".
// can't quite set it to zero yet because of:
// https://github.com/hazelcast/hazelcast/issues/17586
config.setProperty(WAIT_SECONDS_BEFORE_JOIN.getName(), "1");
In the scope of payara-micro, this is useless as payara-micro-bootstrap already defaults the property to "1".
I do not know what happen with other payara flavours.
One solution would be to set the property only to prevent the use of "0" with something like
String initialJoinWaitProperty = System.getProperty("hazelcast.wait.seconds.before.join")
if (initialJoinWaitProperty == null || Integer.parseInt(initialJoinWaitProperty) < 1) {
config.setProperty(WAIT_SECONDS_BEFORE_JOIN.getName(), "1");
}
Operating System
Linux
JDK Version
OpenJDK 11
Payara Distribution
Payara Micro
Hello @maxencelaurent!
Thank you for reporting this bug and for providing such a detailed reproducer, analysis and possible solution! I have raised an internal issue with the internal code FISH-7372, in which I related the information on this issue. As you already found a possible solution, you could send a PR which is going to be reviewed by a member of our team. This would greatly speed up the resolution of this issue.