Collect Events error: Unable to insert and return data in 10 attempts
Error with some of the large repositories Traceback (most recent call last): File "/opt/venv/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task R = retval = fun(*args, **kwargs) File "/opt/venv/lib/python3.9/site-packages/celery/app/trace.py", line 734, in protected_call return self.run(*args, **kwargs) File "/augur/augur/tasks/github/events.py", line 46, in collect_events collection_strategy.collect(repo_git, key_auth, core_data_last_collected) File "/augur/augur/tasks/github/events.py", line 275, in collect self._collect_and_process_issue_events(owner, repo, repo_id, key_auth, since) File "/augur/augur/tasks/github/events.py", line 331, in _collect_and_process_issue_events self._insert_contributors(contributors) File "/augur/augur/tasks/github/events.py", line 85, in _insert_contributors batch_insert_contributors(self._logger, contributors) File "/augur/augur/application/db/lib.py", line 293, in batch_insert_contributors bulk_insert_dicts(logger, batch, Contributor, ['cntrb_id']) File "/augur/augur/application/db/lib.py", line 406, in bulk_insert_dicts raise Exception("Unable to insert and return data in 10 attempts") Exception: Unable to insert and return data in 10 attempts
Hi Cali,
It looks like this and #3170 may have the same root cause.
They are both deadlocking on the contributors table, and it is likely that one of the collection tasks caused the initial deadlock, and the other is just caught up in it.
@Ulincsys : Yes, you are correct about this. I still see these on the large test instance, though far more of the events 10 attempt issue than the contributors table contention. I think the contributors table contention is mitigated by hash partitioning the table on an instance this size. That is what I am doing on our large test instance.
This is an issue with large instances of Augur that are collecting on more than 20 repositories simultaneously. We do not think changes to the general Augur architecture are necessary.
Perhaps some instructions on hash partitioning the contributors table, and possibly locating it on a different physical disk, is a useful approach for very large Augur instances. This strategy is working in testing with Augur's largest instance for testing (which we think is also the largest instance).
@cdolfi : Let me know if you want that Hash partitioning strategy in the docs to close this, or what other resolution you are thinking about.
While this events issue happens, it happens rarely when the contributors table is hash partitioned. Effectively, we think the contributor rows are the underlying lock issue that percolates to issues in this case.
@sgoggins Yes docs on the hash partitioning would be great, thanks!
Closing as duplicate of https://github.com/chaoss/augur/issues/3170 because both are symptoms of deadlocking on the contributors table (as suggested by @Ulincsys)