awx icon indicating copy to clipboard operation
awx copied to clipboard

Move reaper logic into worker, avoiding bottlenecks

Open AlanCoding opened this issue 2 years ago • 0 comments

SUMMARY

I'm hoping that this can be the final piece of the puzzle to address all the reports of bad behavior by the reaper with jobs in the "waiting" status.

Another piece is up at https://github.com/ansible/awx/pull/12573

With this change, we will still sometimes see slowness as self.pool.up() is called to expand the worker pool, but we will remove the risky behavior of running (potentially slow) database queries periodically from the main dispatcher process.

It is believed that slowness in the main dispatcher process can cause a pileup of incoming messages, which is what leaves jobs in the waiting state.

This moves the .cleanup() logic out of the main process and into a worker process, and runs at the same period it did before. Local reaper will be a side-effect of the cluster_node_heartbeat task, that was true before and will stay the same here. Instead of having the main dispatcher process run the logic, it will instead attach a number of necessary parameters, so it will can be ran in the main body of the task.

ISSUE TYPE
  • Bug or Docs Fix
COMPONENT NAME
  • API
ADDITIONAL INFORMATION

This will have conflicts as other merges happen. That's fine, there shouldn't be any problem resolving them.

AlanCoding avatar Jul 27 '22 15:07 AlanCoding