manageiq icon indicating copy to clipboard operation
manageiq copied to clipboard

[WIP] Batch collect gap metrics collection

Open kbrock opened this issue 3 years ago • 15 comments

Overview

When we detect that metrics capture has not been run for a while, or never run for that matter, we submit gap/historical metrics requests.

followup to https://github.com/ManageIQ/manageiq/issues/20071

Problem

Since the number of queue entries are split out by day, we were possibly putting hundreds of requests onto the queue for only a dozen VMs.

Solution

When a gap is detected for inventory items, send the gap collection request in batches. This is minimal since the work to handle batches of inventory items has already been implemented.

For a coordinator with a troublesome C&U installation, gaps are by far the biggest issue we've had in the past in terms of over extending our queue.

Now, it is putting a dozen message for hundreds of vms. The backend has already been changed to handle batching collection. This changes the coordinator to send the collection requests in batches

This PR is split up into many improvements. They build a little but are relatively independent:

testing:

  • refactor perf_process to make profiling easier
  • tests ignore batching and date ranges to make sporadic test failures less likely
  • drop unused metrics test helpers
  • use ids instead of objects for metrics to make test failures easier to read

performance:

  • [x] #22191
  • [x] https://github.com/ManageIQ/manageiq/pull/22287

enhancements:

  • use same last_perf_capture_on for the batch to make batching easier in the future
  • sort batches so ids are sent more consistently
  • send historical gaps as batches of ids (but still batch by date) this has big performance improvement b/c 250 times fewer queue entries. (# vms * # days / 250)

kbrock avatar Nov 09 '22 03:11 kbrock

update:

  • fix cops
  • fix failing test (missed a rename)

kbrock avatar Nov 09 '22 16:11 kbrock

@Fryguy The commits look good, I just left as WIP because I wanted to discuss which of these commits we want to pull out into separate PRs.

Let me know if you want to discuss them (or if some of the commits need better documentation)

kbrock avatar Dec 09 '22 18:12 kbrock

This pull request has been automatically marked as stale because it has not been updated for at least 3 months.

If these changes are still valid, please remove the stale label, make any changes requested by reviewers (if any), and ensure that this issue is being looked at by the assigned/reviewer(s)

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

miq-bot avatar Apr 10 '23 00:04 miq-bot

This pull request has been automatically marked as stale because it has not been updated for at least 3 months.

If these changes are still valid, please remove the stale label, make any changes requested by reviewers (if any), and ensure that this issue is being looked at by the assigned/reviewer(s)

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

miq-bot avatar Apr 10 '23 00:04 miq-bot

This pull request is not mergeable. Please rebase and repush.

miq-bot avatar Jun 06 '23 18:06 miq-bot

This pull request has been automatically closed because it has not been updated for at least 3 months.

Feel free to reopen this pull request if these changes are still valid.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

miq-bot avatar Jun 12 '23 00:06 miq-bot

This pull request has been automatically closed because it has not been updated for at least 3 months.

Feel free to reopen this pull request if these changes are still valid.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

miq-bot avatar Sep 18 '23 00:09 miq-bot

we still have more work to do here

kbrock avatar Dec 15 '23 06:12 kbrock

Checked commits https://github.com/kbrock/manageiq/compare/e70ca4eb30535534610afef327db1dc6d79b9117~...fa6f72b17d8ec8f51d5849188c2eeccfc4e7ba46 with ruby 2.7.8, rubocop 1.56.3, haml-lint 0.51.0, and yamllint 5 files checked, 0 offenses detected Everything looks fine. :cake:

miq-bot avatar Dec 15 '23 06:12 miq-bot

This pull request has been automatically marked as stale because it has not been updated for at least 3 months.

If these changes are still valid, please remove the stale label, make any changes requested by reviewers (if any), and ensure that this issue is being looked at by the assigned/reviewer(s).

miq-bot avatar Mar 18 '24 00:03 miq-bot

This pull request has been automatically marked as stale because it has not been updated for at least 3 months.

If these changes are still valid, please remove the stale label, make any changes requested by reviewers (if any), and ensure that this issue is being looked at by the assigned/reviewer(s).

miq-bot avatar Jun 24 '24 00:06 miq-bot