ignite icon indicating copy to clipboard operation
ignite copied to clipboard

NullPointerException during Client Node Join in Apache Ignite 2.15.0

Open jamnaritesh opened this issue 8 months ago • 3 comments

NullPointerException during Client Node Join in Apache Ignite 2.15.0

Environment: Apache Ignite Version: 2.15.0 Cluster Size: 14 server nodes, 4 client nodes Deployment: Dockerized nodes with host networking JDK: Java 17 OS: RHEL 9

Summary When a client node attempts to join an existing Apache Ignite cluster, a NullPointerException is thrown during the partition map exchange phase. The error occurs in GridCachePartitionExchangeManager.clientTopology, where it attempts to invoke .config() on a null CacheGroupDescriptor.

Error Logs 04-04-2025 19:59:10.857 [sys-#86] ERROR ROOT.? - Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=java.lang.NullPointerException: Cannot invoke "o.a.i.i.processors.cache.CacheGroupDescriptor.config()" because "grpDesc" is null]] java.lang.NullPointerException: Cannot invoke "org.apache.ignite.internal.processors.cache.CacheGroupDescriptor.config()" because "grpDesc" is null at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:1055) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.lambda$updatePartitionFullMap$81bdb8e8$1(GridDhtPartitionsExchangeFuture.java:4831) at org.apache.ignite.internal.util.IgniteUtils.lambda$null$3(IgniteUtils.java:11609) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840) Suppressed: java.lang.NullPointerException: Cannot invoke "org.apache.ignite.internal.processors.cache.CacheGroupDescriptor.config()" because "grpDesc" is null ... 7 common frames omitted 04-04-2025 19:59:10.899 [sys-#86] ERROR ROOT.? - JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=java.lang.NullPointerException: Cannot invoke "o.a.i.i.processors.cache.CacheGroupDescriptor.config()" because "grpDesc" is null]] Reproduction Steps Start a cluster with 14 server nodes. Try joining 4 client nodes in parallel as the cluster is coming up.

Impact Prevents successful join of client nodes during topology changes.

Expected Behavior Client node should be able to join and complete partition exchange without encountering NullPointerException.

jamnaritesh avatar Apr 04 '25 12:04 jamnaritesh

just to add more info, if i retry the node join at a later point, it works fine. Can someone please guide me towards what could be wrong here? We are starting the server nodes in parallel, and then starting some client nodes. It is a cluster backes with cassandra as backing store and auto baseline enabled.

jamnaritesh avatar Apr 04 '25 12:04 jamnaritesh

Does it happen in more recent versions (Ignite 2.17)?

ptupitsyn avatar Apr 04 '25 13:04 ptupitsyn

Hey @ptupitsyn thanks for your response. I'm using an internal wrapper on top of Ignite, so did not get a chance to test it out yet with 2.17.0. Did we fix this as a part of upgrade to 2.17.0?

jamnaritesh avatar Apr 04 '25 18:04 jamnaritesh