ranger icon indicating copy to clipboard operation
ranger copied to clipboard

RANGER-5322: PolicyRefresher can struck with no download thread if exception thrown in startRefresher

Open vyommani opened this issue 3 months ago • 3 comments

What changes were proposed in this pull request?

Handled the exception thrown in loadRoles() and loadPolicy()

How was this patch tested?

run mvn clean install and all tests are clean.

vyommani avatar Sep 13 '25 10:09 vyommani

I updated the PR after further debugging and found that both loadRoles() and loadPolicy() handle all exceptions internally without throwing any. While the code isn't very clean or readable, it is correct. In this PR, I fixed a couple of minor issues and added tests to cover the scenarios.

vyommani avatar Oct 17 '25 15:10 vyommani

@vyommani I see the usages of timeout/sleep in the tests so asking, how much time latency does the new tests add to the current tests? Would it be a good idea to keep constants like TEST_TIMEOUT_SECONDS in a constants class whereby all timeouts across different tests can be maintained in a single file, it might be helpful to segment these variables for (short, mod, long) running tests.

CC: @mneethiraj

kumaab avatar Oct 27 '25 22:10 kumaab

@vyommani I see the usages of timeout/sleep in the tests so asking, how much time latency does the new tests add to the current tests? Would it be a good idea to keep constants like TEST_TIMEOUT_SECONDS in a constants class whereby all timeouts across different tests can be maintained in a single file, it might be helpful to segment these variables for (short, mod, long) running tests.

CC: @mneethiraj

@kumaab, It will be really good if we create a separate file where we put all the wait timeout etc. For the newly added test below are the approximate worst and average case times.

Worst-case latency: 10-15 seconds (due to factors like slow I/O, thread scheduling, or retries) Actual latency:

1->Latch countdown: 100-500 ms 2->File polling: under 1 second 3->Real average per test: ~1-3 seconds

The latency is deemed acceptable to ensure reliable testing of the PolicyRefresher background thread, which involves: Asynchronous policy loading

1->Polling Ranger Admin 2->Writing cached policies to disk 3->Handling retries and edge cases

Without these waits, tests might be flaky, passing locally but failing in CI due to timing issues.

vyommani avatar Oct 28 '25 03:10 vyommani