treeherder icon indicating copy to clipboard operation
treeherder copied to clipboard

Catch, and warn about TaskclusterRestFailure during pulse job ingestion.

Open gmierz opened this issue 2 years ago • 2 comments

This patch fixes an issue that occurs when testing the Treeherder backend locally. Sometimes non-existent tasks are passed as pulse jobs which then cause unrecoverable failures. Here, a try/catch is added to catch the TaskclusterRestFailure that causes the unrecoverable failure during pulse message/job handling. Some logging is also added to provide a warning in this situation.

gmierz avatar May 10 '23 16:05 gmierz

After this fix, this is what the failures look like:

[2023-05-10 16:22:13,631] WARNING [treeherder.etl.tasks.pulse_tasks:44] Failed to parse pulse message: `HtGJWcP0T0iaHRBFCNHZJg` does not correspond to a task that exists.
Are you sure this task has already been submitted?

---

* method:     task
* errorCode:  ResourceNotFound
* statusCode: 404
* time:       2023-05-10T16:22:13.594Z
[2023-05-10 16:22:13,631: WARNING/ForkPoolWorker-1] Failed to parse pulse message: `HtGJWcP0T0iaHRBFCNHZJg` does not correspond to a task that exists.
Are you sure this task has already been submitted?

---

* method:     task
* errorCode:  ResourceNotFound
* statusCode: 404
* time:       2023-05-10T16:22:13.594Z

Previously, we would get an unrecoverable failure:

celery_1           | [2023-05-10 12:08:52,336: INFO/ForkPoolWorker-1] Task store-pulse-tasks[16090e55-b271-4f29-9fc0-431780315912] retry: Retry in 10s: TaskclusterRestFailure('`bFX2cO0FRWm4tmiv09c-HA` does not correspond to a task that exists.\nAre you sure this task has already been submitted?\n\n---\n\n* method:     task\n* errorCode:  ResourceNotFound\n* statusCode: 404\n* time:       2023-05-10T12:08:52.257Z')
celery_1           | [2023-05-10 11:46:43,551: CRITICAL/MainProcess] Unrecoverable error: TypeError("__init__() missing 1 required positional argument: 'superExc'")
celery_1           | Traceback (most recent call last):
celery_1           |   File "/usr/local/lib/python3.9/site-packages/celery/worker/worker.py", line 203, in start
celery_1           |     self.blueprint.start(self)
celery_1           |   File "/usr/local/lib/python3.9/site-packages/celery/bootsteps.py", line 116, in start
celery_1           |     step.start(parent)
celery_1           |   File "/usr/local/lib/python3.9/site-packages/celery/bootsteps.py", line 365, in start
celery_1           |     return self.obj.start()
celery_1           |   File "/usr/local/lib/python3.9/site-packages/celery/worker/consumer/consumer.py", line 332, in start
celery_1           |     blueprint.start(self)
celery_1           |   File "/usr/local/lib/python3.9/site-packages/celery/bootsteps.py", line 116, in start
celery_1           |     step.start(parent)
celery_1           |   File "/usr/local/lib/python3.9/site-packages/celery/worker/consumer/consumer.py", line 628, in start
celery_1           |     c.loop(*c.loop_args())
celery_1           |   File "/usr/local/lib/python3.9/site-packages/celery/worker/loops.py", line 97, in asynloop
celery_1           |     next(loop)
celery_1           |   File "/usr/local/lib/python3.9/site-packages/kombu/asynchronous/hub.py", line 362, in create_loop
celery_1           |     cb(*cbargs)
celery_1           |   File "/usr/local/lib/python3.9/site-packages/celery/concurrency/asynpool.py", line 325, in on_result_readable
celery_1           |     next(it)
celery_1           |   File "/usr/local/lib/python3.9/site-packages/celery/concurrency/asynpool.py", line 306, in _recv_message
celery_1           |     message = load(bufv)
celery_1           | TypeError: __init__() missing 1 required positional argument: 'superExc'

gmierz avatar May 10 '23 16:05 gmierz

please run black treeherder/etl/tasks/pulse_tasks.py and amend your commit

jmaher avatar May 10 '23 16:05 jmaher