daos icon indicating copy to clipboard operation
daos copied to clipboard

DAOS-8331 client: Export client metrics via agent

Open mjmac opened this issue 2 years ago • 18 comments

Adds new agent config parameters and code to optionally export client metrics in Prometheus format.

Example daos_agent.yml updates: telemetry_port: 9192 # export on port 9192 telemetry_retain: 5m # retain metrics for 5 minutes # after client exit

Change-Id: I77864682cc19fa4c33f326d879e20704ef57a7ea Required-githooks: true Signed-off-by: Michael MacDonald [email protected]

mjmac avatar Dec 28 '23 19:12 mjmac

Bug-tracker data: Ticket title is 'Client side metrics/stats support for DAOS' Status is 'Awaiting Verification' Labels: 'HPE' https://daosio.atlassian.net/browse/DAOS-8331

github-actions[bot] avatar Dec 28 '23 19:12 github-actions[bot]

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13545/6/display/redirect

daosbuild1 avatar Jan 05 '24 18:01 daosbuild1

Should there be new entries added to utils/config/daos_agent.yml?

Yes, good catch. I forgot about those.

Also, could I ask for this small change? It would allow functional tests - specifically, the performance tests - to set the config 578d907 And since it's unused, no special testing is needed

I'll merge that in, thanks. Actually, I may try to add a ftest for this work, so that change makes it even easier.

mjmac avatar Jan 11 '24 17:01 mjmac

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13545/10/execution/node/284/log

daosbuild1 avatar Jan 31 '24 18:01 daosbuild1

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13545/10/execution/node/279/log

daosbuild1 avatar Jan 31 '24 18:01 daosbuild1

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13545/10/execution/node/366/log

daosbuild1 avatar Jan 31 '24 18:01 daosbuild1

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13545/10/execution/node/361/log

daosbuild1 avatar Jan 31 '24 18:01 daosbuild1

I'll merge that in, thanks. Actually, I may try to add a ftest for this work, so that change makes it even easier.

Just refreshed this patch. I did add the agent_utils_params.py changes. I have not gotten to adding the ftest yet. As these metrics are still somewhat of a WIP, IMO it's premature to add tests that are expecting fixed sets of metrics while we're iterating. I agree with @wangdi1 that we should add the ftest later.

mjmac avatar Jan 31 '24 18:01 mjmac

Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13545/10/execution/node/480/log

daosbuild1 avatar Jan 31 '24 18:01 daosbuild1

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13545/10/execution/node/347/log

daosbuild1 avatar Jan 31 '24 18:01 daosbuild1

Bug-tracker data: Ticket title is 'Client side metrics/stats support for DAOS' Status is 'Awaiting Verification' Labels: 'HPE' https://daosio.atlassian.net/browse/DAOS-8331

github-actions[bot] avatar Feb 20 '24 21:02 github-actions[bot]

Functional on EL 9 Test Results (old)

135 tests   131 :white_check_mark:  1h 31m 39s :stopwatch:  41 suites    4 :zzz:  41 files      0 :x:

Results for commit 98945324.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Feb 20 '24 23:02 github-actions[bot]

Functional on EL 8.8 Test Results (old)

135 tests   131 :white_check_mark:  1h 29m 5s :stopwatch:  41 suites    4 :zzz:  41 files      0 :x:

Results for commit 98945324.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Feb 20 '24 23:02 github-actions[bot]

Functional Hardware Medium Test Results (old)

130 tests   104 :white_check_mark:  2h 9m 52s :stopwatch:  34 suites   26 :zzz:  34 files      0 :x:

Results for commit 98945324.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Feb 21 '24 07:02 github-actions[bot]

Functional Hardware Medium Verbs Provider Test Results (old)

55 tests   54 :white_check_mark:  4h 7m 31s :stopwatch:  7 suites   1 :zzz:  7 files     0 :x:

Results for commit 98945324.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Feb 21 '24 11:02 github-actions[bot]

Functional Hardware Large Test Results (old)

64 tests   64 :white_check_mark:  28m 42s :stopwatch: 14 suites   0 :zzz: 14 files     0 :x:

Results for commit 98945324.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Feb 21 '24 11:02 github-actions[bot]

Bug-tracker data: Ticket title is 'Client side metrics/stats support for DAOS' Status is 'Awaiting Verification' Labels: 'HPE' https://daosio.atlassian.net/browse/DAOS-8331

github-actions[bot] avatar Feb 21 '24 16:02 github-actions[bot]

Requesting early reviews while waiting for the base patch to land, TIA.

mjmac avatar Feb 26 '24 17:02 mjmac

Closed in favor of the approach in #14030.

mjmac avatar Mar 24 '24 14:03 mjmac