DAOS-11227 pool: Retry map_refresh on more errors
POOL_TGT_QUERY_MAP RPCs may encounter a large number of different remote errors. For instance, from a restarted-but-not-yet-reintegrated engine, we may get -DER_NO_HDL; Zhao Zhen has also observed -DER_OOG. For those errors that are not normally retryable, this patch lets map_refresh retry a limited number of times and fall back to dc_pool_query:
-
To use dc_pool_query, dc_pool_create_map_refresh_task has to take a pool handle instead of a dc_pool object.
-
Tune the backoff sequence of a map_refresh task a bit for hopefully better scalability.
-
Add a new daos_test case for the new fallback mechanism. And, bump the corresponding test timeout a little, since this new test involves a rebuild/reintegration cycle.
Signed-off-by: Li Wei [email protected] Required-githooks: true
Bug-tracker data: Ticket title is 'dfuse got error DER_NO_PERM after kill one engine' Status is 'In Progress' Labels: 'daily_test,triaged' https://daosio.atlassian.net/browse/DAOS-11227