augur
augur copied to clipboard
Message Collection Timeout: Large Repo/First Run
There is an issue with long running message collection first jobs, where they sometimes do not finish for large repos due to the following error:
Traceback (most recent call last):
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1969, in _exec_single_context
self.dialect.do_execute(
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 922, in do_execute
cursor.execute(statement, parameters)
psycopg2.OperationalError: SSL connection has been closed unexpectedly
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ubuntu/github/augur/augur/tasks/github/messages.py", line 43, in collect_github_messages
process_large_issue_and_pr_message_collection(repo_id, repo_git, logger, manifest.key_auth, task_name, augur_db)
File "/home/ubuntu/github/augur/augur/tasks/github/messages.py", line 109, in process_large_issue_and_pr_message_collection
process_messages(all_data, task_name, repo_id, logger, augur_db)
File "/home/ubuntu/github/augur/augur/tasks/github/messages.py", line 212, in process_messages
augur_db.insert_data(contributors, Contributor, ["cntrb_id"])
File "/home/ubuntu/github/augur/augur/application/db/session.py", line 188, in insert_data
raise e
File "/home/ubuntu/github/augur/augur/application/db/session.py", line 175, in insert_data
connection.execute(stmnt)
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1416, in execute
return meth(
^^^^^
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 516, in _execute_on_connection
return connection._execute_clauseelement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1639, in _execute_clauseelement
ret = self._execute_context(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1848, in _execute_context
return self._exec_single_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1988, in _exec_single_context
self._handle_dbapi_exception(
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2343, in _handle_dbapi_exception
raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1969, in _exec_single_context
self.dialect.do_execute(
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 922, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SSL connection has been closed unexpectedly
[SQL: INSERT INTO augur_data.contributors (cntrb_id, cntrb_login, cntrb_email, cntrb_full_name, cntrb_company, cntrb_created_at, cntrb_location, cntrb_canonical, cntrb_last_used, gh_user_id, gh_login, gh_url, gh_html_url, gh_node_id, gh_avatar_url, gh_gravatar_id, gh_followers_url, gh_following_url, gh_gists_url, gh_starred_url, gh_subscriptions_url, gh_organizations_url, gh_repos_url, gh_events_url, gh_received_events_url, gh_type, gh_site_admin, tool_source, tool_version, data_source) VALUES (%(cntrb_id_m0)s::UUID, %(cntrb_login_m0)s, %(cntrb_email_m0)s, %(cntrb_full_name_m0)s, %(cntrb_company_m0)s, %(cntrb_created_at_m0)s, %(cntrb_location_m0)s, %(cntrb_canonical_m0)s, %(cntrb_last_used_m0)s, %(gh_user_id_m0)s, %(gh_login_m0)s, %(gh_url_m0)s, %(gh_html_url_m0)s, %(gh_node_id_m0)s, %(gh_avatar_url_m0)s, %(gh_gravatar_id_m0)s, %(gh_followers_url_m0)s, %(gh_following_url_m0)s, %(gh_gists_url_m0)s, %(gh_starred_url_m0)s, %(gh_subscriptions_url_m0)s, %(gh_organizations_url_m0)s, %(gh_repos_url_m0)s, %(gh_events_url_m0)s, %(gh_received_events_url_m0)s, %(gh_type_m0)s, %(gh_site_admin_m0)s, %(tool_source_m0)s, %(tool_version_m0)s, %(data_source_m0)s), (%(cntrb_id_m1)s::UUID, %(cntrb_login_m1)s, %(cntrb_email_m1)s, %(cntrb_full_name_m1)s, %(cntrb_company_m1)s, %(cntrb_created_at_m1)s, %(cntrb_location_m1)s, %(cntrb_canonical_m1)s, %(cntrb_last_used_m1)s, %(gh_user_id_m1)s, %(gh_login_m1)s, %(gh_url_m1)s, %(gh_html_url_m1)s, %(gh_node_id_m1)s, %(gh_avatar_url_m1)s, %(gh_gravatar_id_m1)s, %(gh_followers_url_m1)s, %(gh_following_url_m1)s, %(gh_gists_url_m1)s, %(gh_starred_url_m1)s, %(gh_subscriptions_url_m1)s, %(gh_organizations_url_m1)s, %(gh_repos_url_m1)s, %(gh_events_url_m1)s, %(gh_received_events_url_m1)s, %(gh_type_m1)s, %(gh_site_admin_m1)s, %(tool_source_m1)s, %(tool_version_m1)s, %(data_source_m1)s), (%(cntrb_id_m2)s::UUID, %(cntrb_login_m2)s, %(cntrb_email_m2)s, %(cntrb_full_name_m2)s, %(cntrb_company_m2)s, %(cntrb_created_at_m2)s, %(cntrb_location_m2)s, %(cntrb_canonical_m2)s, %(cntrb_last_used_m2)s, %(gh_user_id_m2)s, %(gh_login_m2)s, %(gh_url_m2)s, %(gh_html_url_m2)s, %(gh_node_id_m2)s, %(gh_avatar_url_m2)s, %(gh_gravatar_id_m2)s, %(gh_followers_url_m2)s, %(gh_following_url_m2)s, %(gh_gists_url_m2)s, %(gh_starred_url_m2)s, %(gh_subscriptions_url_m2)s, %(gh_organizations_url_m2)s, %(gh_repos_url_m2)s, %(gh_events_url_m2)s, %(gh_received_events_url_m2)s, %(gh_type_m2)s, %(gh_site_admin_m2)s, %(tool_source_m2)s, %(tool_version_m2)s, %(data_source_m2)s), (%(cntrb_id_m3)s::UUID, %(cntrb_login_m3)s, %(cntrb_email_m3)s, %(cntrb_full_name_m3)s, %(cntrb_company_m3)s, %(cntrb_created_at_m3)s, %(cntrb_location_m3)s, %(cntrb_canonical_m3)s, %(cntrb_last_used_m3)s, %(gh_user_id_m3)s, %(gh_login_m3)s, %(gh_url_m3)s, %(gh_html_url_m3)s, %(gh_node_id_m3)s, %(gh_avatar_url_m3)s, %(gh_gravatar_id_m3)s, %(gh_followers_url_m3)s, %(gh_following_url_m3)s, %(gh_gists_url_m3)s, %(gh_starred_url_m3)s, %(gh_subscriptions_url_m3)s, %(gh_organizations_url_m3)s, %(gh_repos_url_m3)s, %(gh_events_url_m3)s, %(gh_received_events_url_m3)s, %(gh_type_m3)s, %(gh_site_admin_m3)s, %(tool_source_m3)s, %(tool_version_m3)s, %(data_source_m3)s) ON CONFLICT (cntrb_id) DO UPDATE SET cntrb_id = excluded.cntrb_id, cntrb_login = excluded.cntrb_login, cntrb_email = excluded.cntrb_email, cntrb_full_name = excluded.cntrb_full_name, cntrb_company = excluded.cntrb_company, cntrb_created_at = excluded.cntrb_created_at, cntrb_location = excluded.cntrb_location, cntrb_canonical = excluded.cntrb_canonical, cntrb_last_used = excluded.cntrb_last_used, gh_user_id = excluded.gh_user_id, gh_login = excluded.gh_login, gh_url = excluded.gh_url, gh_html_url = excluded.gh_html_url, gh_node_id = excluded.gh_node_id, gh_avatar_url = excluded.gh_avatar_url, gh_gravatar_id = excluded.gh_gravatar_id, gh_followers_url = excluded.gh_followers_url, gh_following_url = excluded.gh_following_url, gh_gists_url = excluded.gh_gists_url, gh_starred_url = excluded.gh_starred_url, gh_subscriptions_url = excluded.gh_subscriptions_url, gh_organizations_url = excluded.gh_organizations_url, gh_repos_url = excluded.gh_repos_url, gh_events_url = excluded.gh_events_url, gh_received_events_url = excluded.gh_received_events_url, gh_type = excluded.gh_type, gh_site_admin = excluded.gh_site_admin, tool_source = excluded.tool_source, tool_version = excluded.tool_version, data_source = excluded.data_source]
[parameters: {'cntrb_id_m0': UUID('01000027-9900-0000-0000-000000000000'), 'cntrb_login_m0': 'ghost', 'cntrb_email_m0': None, 'cntrb_full_name_m0': None, 'cntrb_company_m0': None, 'cntrb_created_at_m0': None, 'cntrb_location_m0': None, 'cntrb_canonical_m0': None, 'cntrb_last_used_m0': None, 'gh_user_id_m0': 10137, 'gh_login_m0': 'ghost', 'gh_url_m0': 'https://api.github.com/users/ghost', 'gh_html_url_m0': 'https://github.com/ghost', 'gh_node_id_m0': 'MDQ6VXNlcjEwMTM3', 'gh_avatar_url_m0': 'https://avatars.githubusercontent.com/u/10137?v=4', 'gh_gravatar_id_m0': '', 'gh_followers_url_m0': 'https://api.github.com/users/ghost/followers', 'gh_following_url_m0': 'https://api.github.com/users/ghost/following{/other_user}', 'gh_gists_url_m0': 'https://api.github.com/users/ghost/gists{/gist_id}', 'gh_starred_url_m0': 'https://api.github.com/users/ghost/starred{/owner}{/repo}', 'gh_subscriptions_url_m0': 'https://api.github.com/users/ghost/subscriptions', 'gh_organizations_url_m0': 'https://api.github.com/users/ghost/orgs', 'gh_repos_url_m0': 'https://api.github.com/users/ghost/repos', 'gh_events_url_m0': 'https://api.github.com/users/ghost/events{/privacy}', 'gh_received_events_url_m0': 'https://api.github.com/users/ghost/received_events', 'gh_type_m0': 'User', 'gh_site_admin_m0': False, 'tool_source_m0': 'Pr comment task', 'tool_version_m0': '2.0', 'data_source_m0': 'Github API', 'cntrb_id_m1': UUID('0106da09-7500-0000-0000-000000000000'), 'cntrb_login_m1': 'stephengillie', 'cntrb_email_m1': None, 'cntrb_full_name_m1': None, 'cntrb_company_m1': None, 'cntrb_created_at_m1': None, 'cntrb_location_m1': None, 'cntrb_canonical_m1': None, 'cntrb_last_used_m1': None, 'gh_user_id_m1': 114952565, 'gh_login_m1': 'stephengillie', 'gh_url_m1': 'https://api.github.com/users/stephengillie', 'gh_html_url_m1': 'https://github.com/stephengillie', 'gh_node_id_m1': 'U_kgDOBtoJdQ', 'gh_avatar_url_m1': 'https://avatars.githubusercontent.com/u/114952565?v=4', 'gh_gravatar_id_m1': '', 'gh_followers_url_m1': 'https://api.github.com/users/stephengillie/followers', 'gh_following_url_m1': 'https://api.github.com/users/stephengillie/following{/other_user}', 'gh_gists_url_m1': 'https://api.github.com/users/stephengillie/gists{/gist_id}', 'gh_starred_url_m1': 'https://api.github.com/users/stephengillie/starred{/owner}{/repo}' ... 20 parameters truncated ... 'gh_login_m2': 'wingetbot', 'gh_url_m2': 'https://api.github.com/users/wingetbot', 'gh_html_url_m2': 'https://github.com/wingetbot', 'gh_node_id_m2': 'MDQ6VXNlcjYzODE2OTk5', 'gh_avatar_url_m2': 'https://avatars.githubusercontent.com/u/63816999?v=4', 'gh_gravatar_id_m2': '', 'gh_followers_url_m2': 'https://api.github.com/users/wingetbot/followers', 'gh_following_url_m2': 'https://api.github.com/users/wingetbot/following{/other_user}', 'gh_gists_url_m2': 'https://api.github.com/users/wingetbot/gists{/gist_id}', 'gh_starred_url_m2': 'https://api.github.com/users/wingetbot/starred{/owner}{/repo}', 'gh_subscriptions_url_m2': 'https://api.github.com/users/wingetbot/subscriptions', 'gh_organizations_url_m2': 'https://api.github.com/users/wingetbot/orgs', 'gh_repos_url_m2': 'https://api.github.com/users/wingetbot/repos', 'gh_events_url_m2': 'https://api.github.com/users/wingetbot/events{/privacy}', 'gh_received_events_url_m2': 'https://api.github.com/users/wingetbot/received_events', 'gh_type_m2': 'User', 'gh_site_admin_m2': False, 'tool_source_m2': 'Pr comment task', 'tool_version_m2': '2.0', 'data_source_m2': 'Github API', 'cntrb_id_m3': UUID('01023116-4900-0000-0000-000000000000'), 'cntrb_login_m3': 'azure-pipelines[bot]', 'cntrb_email_m3': None, 'cntrb_full_name_m3': None, 'cntrb_company_m3': None, 'cntrb_created_at_m3': None, 'cntrb_location_m3': None, 'cntrb_canonical_m3': None, 'cntrb_last_used_m3': None, 'gh_user_id_m3': 36771401, 'gh_login_m3': 'azure-pipelines[bot]', 'gh_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D', 'gh_html_url_m3': 'https://github.com/apps/azure-pipelines', 'gh_node_id_m3': 'MDM6Qm90MzY3NzE0MDE=', 'gh_avatar_url_m3': 'https://avatars.githubusercontent.com/in/9426?v=4', 'gh_gravatar_id_m3': '', 'gh_followers_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/followers', 'gh_following_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/following{/other_user}', 'gh_gists_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/gists{/gist_id}', 'gh_starred_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/starred{/owner}{/repo}', 'gh_subscriptions_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/subscriptions', 'gh_organizations_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/orgs', 'gh_repos_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/repos', 'gh_events_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/events{/privacy}', 'gh_received_events_url_m3': 'https://api.github.com/users/azure-pipelines%5Bbot%5D/received_events', 'gh_type_m3': 'Bot', 'gh_site_admin_m3': False, 'tool_source_m3': 'Pr comment task', 'tool_version_m3': '2.0', 'data_source_m3': 'Github API'}]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1119, in _rollback_impl
self.engine.dialect.do_rollback(self.connection)
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 692, in do_rollback
dbapi_connection.rollback()
psycopg2.InterfaceError: connection already closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 451, in trace_task
R = retval = fun(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 734, in __protected_call__
return self.run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/github/augur/augur/tasks/github/messages.py", line 22, in collect_github_messages
with GithubTaskManifest(logger) as manifest:
File "/home/ubuntu/github/augur/augur/tasks/github/util/github_task_session.py", line 26, in __exit__
self.augur_db.close()
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2466, in close
self._close_impl(invalidate=False)
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2535, in _close_impl
transaction.close(invalidate)
File "<string>", line 2, in close
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go
ret_value = fn(self, *arg, **kw)
^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1362, in close
transaction.close()
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2576, in close
self._do_close()
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2714, in _do_close
self._close_impl()
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2700, in _close_impl
self._connection_rollback_impl()
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2692, in _connection_rollback_impl
self.connection._rollback_impl()
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1121, in _rollback_impl
self._handle_dbapi_exception(e, None, None, None, None)
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2343, in _handle_dbapi_exception
raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1119, in _rollback_impl
self.engine.dialect.do_rollback(self.connection)
File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 692, in do_rollback
dbapi_connection.rollback()
sqlalchemy.exc.InterfaceError: (psycopg2.InterfaceError) connection already closed
(Background on this error at: https://sqlalche.me/e/20/rvf5)
Suggested Fix:
- Test for loss of connection
- If connection is lost, open a new one instead of losing all the data in a transaction and setting status to fail
- Big issue is that "core collection" includes a lot of jobs, and when this one fails, all of them get rerun on the same repository again and again.
Specifics:
- https://github.com/microsoft/winget-pkgs
- Received 2024-08-05 14:38:37.440960 UTC
- Started 2024-08-05 14:38:37.444868 UTC
- Failed 2024-08-09 06:51:56.523601 UTC
This query :
select msg_id, msg_timestamp, data_collection_date
from message
where repo_id=50183 order by msg_timestamp desc;
shows that the message data is actually quite current:
msg_id msg_timestamp data_collection_date
1187399851 2024-08-07 23:24:23 2024-08-08 07:31:31
1187971112 2024-08-07 16:07:19 2024-08-08 16:24:42
1187971111 2024-08-07 15:50:44 2024-08-08 16:24:42
A reverse order of the same query show our earliest messages are from April, 2020:
msg_id msg_timestamp data_collection_date
32392606 2020-04-30 21:38:44 2023-05-11 06:24:54
32392607 2020-04-30 21:38:53 2023-05-11 06:24:54
32392608 2020-04-30 21:41:03 2023-05-11 06:24:54
So, our message collection seems "complete" as April 2020 is when this project started.
Waiting to see if this occurs again