[tech-debt][performance] Avoid contextually inappropriate references to UJT.last_job
SUMMARY
Related to https://github.com/ansible/awx/issues/13512
ISSUE TYPE
- Bug, Docs Fix or other nominal change
COMPONENT NAME
- API
This is very similar in nature to https://github.com/ansible/awx/pull/13124, but attacking a different problem. The shared theme is fighting the entropy of unnecessarily complex and confusing call stacks when saves to jobs and their related (polymorphic) templates.
I really like this patch. This would be a meaningful tech debt win, and it's very unlikely that other planned work will overlap with what this is changing because it's at the level of the core job models.
Failing CI items, converted to draft
I found these logs from integration tests. The unexpected updates to parent instances are a real problem. The deadlock here is extremely concerning, because it happens in canceling/shutdown events and we have a lot of risk there for broader system deadlock. While this PR might need more testing, this is one of the best things we can do to avoid these situations. A canceled job needs to update the job record and exit fast, and this blocks that.
2024-02-22 04:42:58,862 WARNING [cc896a75fa9245ed93332279170e429b] awx.main.dispatch Sending SIGTERM to task id=6163b146-2eff-4219-8713-6f07f23d6b57, task=awx.main.tasks.jobs.RunJob, args=[2152]
2024-02-22 04:42:58,862 WARNING [cc896a75fa9245ed93332279170e429b] awx.main.dispatch Received control signal:
{'control': 'cancel', 'task_ids': ['0ef11231-933d-4712-bd19-256db51903f1'], 'reply_to': None}
2024-02-22 04:42:58,863 WARNING [cc896a75fa9245ed93332279170e429b] awx.main.dispatch Sending SIGTERM to task id=0ef11231-933d-4712-bd19-256db51903f1, task=awx.main.tasks.jobs.RunJob, args=[2153]
2024-02-22 04:42:59,181 ERROR [6e31ac6539574c2b926687c67802eadc] awx.main.dispatch Worker failed to run task awx.main.tasks.jobs.RunJob(*[2153], **{}
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
return self.cursor.execute(sql, params)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
raise ex.with_traceback(None)
psycopg.errors.DeadlockDetected: deadlock detected
DETAIL: Process 593369 waits for ShareLock on transaction 138247; blocked by process 593371.
Process 593371 waits for ShareLock on transaction 138252; blocked by process 593369.
HINT: See server log for query details.
CONTEXT: while updating tuple (2,49) in relation "main_unifiedjobtemplate"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/utils/update_model.py", line 34, in update_model
instance.save(update_fields=update_fields)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/unified_jobs.py", line 926, in save
self._update_parent_instance()
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/unified_jobs.py", line 844, in _update_parent_instance
parent_instance.save(update_fields=update_fields)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/jobs.py", line 356, in save
return super(JobTemplate, self).save(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/unified_jobs.py", line 291, in save
super(UnifiedJobTemplate, self).save(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/polymorphic/models.py", line 87, in save
return super().save(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/base.py", line 344, in save
super(PrimordialModel, self).save(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/base.py", line 192, in save
super(CreatedModifiedModel, self).save(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/mixins.py", line 612, in save
super().save(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/base.py", line 814, in save
self.save_base(
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/base.py", line 876, in save_base
parent_inserted = self._save_parents(cls, using, update_fields)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/base.py", line 918, in _save_parents
updated = self._save_table(
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/base.py", line 990, in _save_table
updated = self._do_update(
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/base.py", line 1054, in _do_update
return filtered._update(values) > 0
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/query.py", line 1231, in _update
return query.get_compiler(self.db).execute_sql(CURSOR)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/sql/compiler.py", line 1984, in execute_sql
cursor = super().execute_sql(result_type)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/sql/compiler.py", line 1562, in execute_sql
cursor.execute(sql, params)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
return self._execute_with_wrappers(
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
return self.cursor.execute(sql, params)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 91, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
return self.cursor.execute(sql, params)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
raise ex.with_traceback(None)
django.db.utils.OperationalError: deadlock detected
DETAIL: Process 593369 waits for ShareLock on transaction 138247; blocked by process 593371.
Process 593371 waits for ShareLock on transaction 138252; blocked by process 593369.
HINT: See server log for query details.
CONTEXT: while updating tuple (2,49) in relation "main_unifiedjobtemplate"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/dispatch/worker/task.py", line 103, in perform_work
result = self.run_callable(body)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/dispatch/worker/task.py", line 78, in run_callable
return _call(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 94, in _wrapped
return f(self, *args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/signals.py", line 81, in _wrapped
return f(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 484, in run
self.instance = self.update_model(pk, status='running', start_args='') # blank field to remove encrypted passwords
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 121, in update_model
return update_model(self.model, pk, _attempt=0, _max_attempts=self.update_attempts, **updates)
File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/utils/update_model.py", line 48, in update_model
raise RuntimeError(f'Could not fetch {pk} because of receiving abort signal')
RuntimeError: Could not fetch 2153 because of receiving abort signal
2024-02-22 04:42:59,226 WARNING [8d40d8cf8607417f82a3e0b30e38f019] awx.main.dispatch checking dispatcher cancel for automationcontroller-0
2024-02-22 04:42:59,237 WARNING [406bfc42b48f43fa8c585bde8fac7bac] awx.main.dispatch Received control signal:
{'control': 'cancel', 'reply_to': 'reply_to_f9550861_5d0a_4ee9_860f_2c99b67be19c', 'task_ids': ['e1e1a67f-f5e2-43c0-9678-711acf552431']}
2024-02-22 04:42:59,244 WARNING [406bfc42b48f43fa8c585bde8fac7bac] awx.main.dispatch Sending SIGTERM to task id=e1e1a67f-f5e2-43c0-9678-711acf552431, task=awx.main.tasks.jobs.RunAdHocCommand, args=[2169]
2024-02-22 04:42:59,342 INFO [fc5a4e7c0574436382955763183e9e2b] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 2158