daos icon indicating copy to clipboard operation
daos copied to clipboard

DAOS-3916 container,md: query metadata access, modify times

Open kccain opened this issue 3 years ago • 3 comments

With this change, daos_cont_open() and daos_cont_query() return the most recent metadata access and modify times in the daos_cont_info_t output argument. The new information is returned as hybrid logical clock (HLC) values. Container operations that only read the metadata state will update the access time only, while other operations will update both access and modify times. This is envisioned as a basic mechanism for a user to identify containers not used recently (from a metadata standpoint, not an IO standpoint).

Additionally, a fix for container upgrade is implemented, since the original code only supported global version 0->1 upgrade and asserts on upgrades from layout version 1->2 and beyond.

A future patch is envisioned to provide a pool list containers interface with some filtering criteria, to find containers that may fit some user-determined criteria for migration or removal.

Changes summary

  • container properties KVS includes new key/val for metadata times (using same DAOS_POOL_GLOBAL_VERSION=2 for master/release 2.4 dev).
  • pool/container upgrade code changed to initialize metadata times.
  • libdaos API minor version incremented (v2.3.0 -> v2.4.0)
  • daos_cont_info_t.ci_pad, .ci_redun_lvl repurposed to make space for the new time fields while keeping the same structure size. (ci_redun_lvl is otherwise available through container properties).
  • daos_test container tests modified to check redundancy level via property value rather than daos_cont_info_t field.
  • daos utility output (both JSON and human-readable when run with -v) contains the new metadata time information.
  • CaRT protocol for CONT_OPEN, CONT_OPEN_BYLABEL, and CONT_QUERY existing version (6) maintained, and new version (7) added that returns metadata access/modify times.
  • engine code register and handle protocol v6 or v7 RPCs.
  • client code registers and uses uses only the new/v7 protocol. (Possible future change: client query engine then register v6 or v7).

Required-githooks: true

Signed-off-by: Kenneth Cain [email protected]

kccain avatar Aug 15 '22 21:08 kccain

Bug-tracker data: Ticket title is 'Add support for container query functionality (open/close/creation time)' Status is 'In Progress' Labels: 'Metadata' https://daosio.atlassian.net/browse/DAOS-3916

github-actions[bot] avatar Aug 15 '22 21:08 github-actions[bot]

Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9985/1/execution/node/167/log

daosbuild1 avatar Aug 15 '22 21:08 daosbuild1

Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9985/2/execution/node/120/log

daosbuild1 avatar Aug 15 '22 21:08 daosbuild1

To be considered as discussed with @johannlombardi today: performance implications of this patch performing rdb updates of the metadata access time for every operation (especially query) i.e., can we avoid rdb_tx_update in some instances?

We want to track open times for sure, so updating (what is currently described as) "atime" in the patch is needed. We could change the implementation so that container query only looks up (not modifies) the metadata times. And likely do the same for all otherwise "read only" metadata operations. Of course any of those operations are typically preceded by a container open that will update the rdb (handle index KVS)..

Looking ahead to the subsequent patch envisioned (the preferred use case), it would be a pool list/filter containers API that would not require a container open handle, and should not require rdb updates.

kccain avatar Aug 16 '22 13:08 kccain

Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9985/3/execution/node/138/log

daosbuild1 avatar Aug 18 '22 01:08 daosbuild1

Test stage Unit Test on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9985/3/execution/node/575/log

daosbuild1 avatar Aug 18 '22 02:08 daosbuild1

Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9985/4/execution/node/144/log

daosbuild1 avatar Aug 18 '22 13:08 daosbuild1

Clean test run, will open up for reviews now.

These failures from build 4 are unrelated to the patch. For the functional HW daos_test failure, I have restarted testing in that stage (now build 5) to get a clean run of that intermittently-failing test.

  • checkpatch - is expected for anything that touches the cart RPC macros
  • codespell - seems to be affecting everything on master. A separate PR 10031 has been landed to fix it after this PR was pushed for CI testing.
  • Test Hardware / Functional Hardware Medium / POOL14: pool connect access based on ACL – FTEST_daos_test.DAOS_Pool - this seems like it could be related to the old intermittent test "dmg_helpers" code failure documented in DAOS-10301. A new ticket DAOS-11434 has been created since the code has recently been revised after fixing the first bug.

kccain avatar Aug 19 '22 15:08 kccain

Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9985/6/execution/node/145/log

daosbuild1 avatar Aug 23 '22 15:08 daosbuild1

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9985/6/execution/node/965/log

daosbuild1 avatar Aug 24 '22 03:08 daosbuild1

In build 6 all tests passed. However, in Functional Hardware Medium, there is a log stating "ERROR: Detected one or more tests that failed archiving!" https://build.hpdd.intel.com/blue/rest/organizations/jenkins/pipelines/daos-stack/pipelines/daos/branches/PR-9985/runs/6/nodes/899/steps/965/log/?start=0

Unless I can find some reason this patch is responsible for the above, I think we should finish reviewing and request force landing.

kccain avatar Aug 24 '22 11:08 kccain

@daos-stack/daos-gatekeeper when landing this PR can the text of the commit message in github be used rather than the commit message from the first push? It has been updated to be consistent with the code as it was changed during the review.

kccain avatar Aug 25 '22 14:08 kccain