at_server icon indicating copy to clipboard operation
at_server copied to clipboard

Investigate and smooth out load spikes on prod worker nodes

Open gkc opened this issue 3 years ago • 9 comments

Lead: @murali-shris

Describe the bug There are periodic large load spikes which correspond with scheduled jobs (compaction, scans, ...) which are straining our worker nodes

Expected outcome

  • Shared documented understanding of which scheduled jobs are driving load
  • Adjusted jobs schedules to spread load ~evenly over time

gkc avatar Jan 24 '22 13:01 gkc

@cconstab @cpswan Can you please grant me access to the required monitoring dashboards to help me investigate this issue

murali-shris avatar Jan 31 '22 06:01 murali-shris

Two possible causes of CPU spike

  • sec check which runs every 15 mins. created inbound connection to secondary and runs scan
  • hive expiry check which is a scheduled job on server which runs every 10 mins.This scans every key in hive and checks for expiry

murali-shris avatar Feb 02 '22 08:02 murali-shris

merged PR to randomise hive expiry check https://github.com/atsign-foundation/at_server/pull/497

murali-shris avatar Feb 04 '22 08:02 murali-shris

Moving the task to next sprint (PR-30) to validate the performance once the changes are deployed.

sitaram-kalluri avatar Feb 07 '22 09:02 sitaram-kalluri

@cconstab @cpswan Is this issue still occurring in prod? any work to be done in the upcoming sprint related to load spikes?

murali-shris avatar Feb 21 '22 04:02 murali-shris

Will take a look or @cpswan

cconstab avatar Feb 21 '22 04:02 cconstab

@murali-shris things are a lot better, but I'm still seeing some hourly spikes, so maybe another scheduled job elsewhere in the secondary?

cpswan avatar Feb 21 '22 12:02 cpswan

@murali-shris @cpswan should we move priority up to high for PR43 sprint planning?

ksanty avatar Jul 28 '22 02:07 ksanty

@murali-shris @cpswan should we move priority up to high for PR43 sprint planning?

yes @ksanty ..we can revisit whether prod spike still exists

murali-shris avatar Jul 28 '22 07:07 murali-shris