gravitino icon indicating copy to clipboard operation
gravitino copied to clipboard

[Improvement] Fix race condition in loadRole

Open justinmclean opened this issue 1 month ago • 6 comments

What would you like to be improved?

The loadRole method (in core/src/main/java/org/apache/gravitino/authorization/AuthorizationRequestContext.java) isn’t thread-safe as it has a race condition. Because hasLoadRole.get() and hasLoadRole.set(true) are separate operations, multiple threads can see false and run the runnable simultaneously.

Suppose two threads call loadRole() at roughly the same time:

  • Both check hasLoadRole.get() before it’s set and both see false.
  • Both enter the if body and run runnable.run().
  • Then both set hasLoadRole.set(true) afterward.

So you can end up with double (or more) executions of the runnable.

Here's a unit test to illustrate:

@Test
  public void testLoadRoleRunsOnceEvenWhenInvokedConcurrently() throws Exception {
    AuthorizationRequestContext context = new AuthorizationRequestContext();
    AtomicInteger counter = new AtomicInteger();
    CountDownLatch firstStarted = new CountDownLatch(1);
    CountDownLatch allowFinish = new CountDownLatch(1);

    Thread firstInvocation =
        new Thread(
            () ->
                context.loadRole(
                    () -> {
                      counter.incrementAndGet();
                      firstStarted.countDown();
                      try {
                        allowFinish.await(5, TimeUnit.SECONDS);
                      } catch (InterruptedException e) {
                        Thread.currentThread().interrupt();
                      }
                    }));
    firstInvocation.start();

    try {
      assertTrue(firstStarted.await(5, TimeUnit.SECONDS));
      context.loadRole(() -> counter.incrementAndGet());
      assertEquals(1, counter.get(), "loadRole should execute runnable only once");
    } finally {
      allowFinish.countDown();
      firstInvocation.join();
    }

    context.loadRole(() -> counter.incrementAndGet());
    assertEquals(1, counter.get(), "Subsequent loadRole calls should be ignored");
  }

How should we improve?

AtomicBoolean makes single operations thread-safe, but not combinations of them, use compareAndSet() instead.

justinmclean avatar Nov 12 '25 03:11 justinmclean

@HatshMehta112 you need to comment here so I can assign it to you.

Your fix does fix the race condition, but I think you may need to do a little more. There are two issues here I think 1) If runnable.run() fails, the flag stays true, even though the role didn’t load. 2) The meaning has changed from "has loaded" to "has started loading" this may have implications for the code that calls loadRole().

justinmclean avatar Nov 12 '25 05:11 justinmclean

@justinmclean Good catch — you’re right that the current implementation sets the flag before running runnable.run(), which means it could remain true even if the load fails. It also changes the flag’s meaning from “has loaded” to “has started loading.”

Can we do it like

 if (!hasLoadRole.compareAndSet(false, true)) {
    return;
  }
  try {
    runnable.run();
  } catch (Exception e) {
    hasLoadRole.set(false); // Reset flag if loading fails
    throw e;
  }

This ensures the flag still means “has loaded” and handles failure cases correctly.

HarshMehta112 avatar Nov 12 '25 05:11 HarshMehta112

That's better and fixes 1, but I think it's best to check how this is called to see if “has loaded” to “has started loading" has an impact.

justinmclean avatar Nov 12 '25 05:11 justinmclean

@justinmclean, I observed the following issues:

  • The method is called from JcasbinAuthorizer.loadRolePrivilege().
  • The runnable modifies shared state (allowEnforcer, denyEnforcer).
  • It performs I/O operations (database reads).
  • Multiple threads could corrupt the enforcer state if they run concurrently.

Recommendation

It’s better to implement double-checked locking as shown below:

public void loadRole(Runnable runnable) {
  if (hasLoadRole.get()) {
    return;
  }
  synchronized (this) {
    // Double-check after acquiring lock (prevents race)
    if (hasLoadRole.get()) {
      return;
    }
    try {
      runnable.run(); // Execute the loading
      hasLoadRole.set(true); // Mark as loaded ONLY after success
    } catch (Exception e) {
      // Don't set flag on failure - allow retry
      throw e;
    }
  }
}

If role loading failsfor any reason the flag remains false, allowing a retry.

I’d appreciate your feedback or suggestions on this.

HarshMehta112 avatar Nov 13 '25 03:11 HarshMehta112

Yes, that looks much better.

justinmclean avatar Nov 14 '25 06:11 justinmclean

Yes, that looks much better.

@justinmclean Thank you for your feedback. I’ve updated the PR with the requested changes. Please review it at your convenience.

HarshMehta112 avatar Nov 14 '25 10:11 HarshMehta112