incubator-devlake icon indicating copy to clipboard operation
incubator-devlake copied to clipboard

[Bug][Github] Error inserting raw rows into _raw_github_api_pull_request_commits (500)

Open mkaufmaner opened this issue 2 years ago • 10 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

What happened

The GitHub plugin pipeline task for a GitHub repository with;

  • ~ 125,000 commits
  • ~ 40,000 pull requests (open & closed)

Full Logs (Sanitized): task-129911-2-1-github.log.zip

Simplified Task JSON:

{
    "id": 125907,
    "createdAt": "2023-10-25T19:15:17.77Z",
    "updatedAt": "2023-10-25T19:19:15.662Z",
    "plugin": "github",
    "subtasks": [
        "collectApiPullRequests",
        "extractApiPullRequests",
        "collectApiComments",
        "extractApiComments",
        "collectApiPullRequestCommits",
        "extractApiPullRequestCommits",
        "collectApiPullRequestReviews",
        "extractApiPullRequestReviews",
        "collectApiPrReviewCommentsMeta",
        "extractApiPrReviewComments",
        "collectAccounts",
        "extractAccounts",
        "collectAccountOrg",
        "ExtractAccountOrg",
        "enrichPullRequestIssues",
        "convertRepo",
        "convertPullRequestCommits",
        "convertPullRequests",
        "convertPullRequestReviews",
        "convertPullRequestLabels",
        "convertPullRequestIssues",
        "convertPullRequestComments",
        "convertAccounts"
    ],
    "options": "{\"connectionId\":5,\"githubId\":509,\"name\":\"xxxx/xxxx-xxxx-xxx-xxxxxx-xxx\",\"timeAfter\":\"2023-04-24T00:00:00-04:00\"}",
    "status": "TASK_FAILED",
    "message": "subtask collectApiPullRequestCommits ended unexpectedly\nWraps: (2)\n  | combined messages: \n  | {\n  | \terror inserting raw rows into _raw_github_api_pull_request_commits (500)\n  | \t=====================\n [...] \terror inserting raw rows into _raw_github_api_pull_request_commits (500)\n  | }\nError types: (1) *hintdetail.withDetail (2) *errors.errorString",
    "errorName": "subtask collectApiPullRequestCommits ended unexpectedly\ncaused by: error inserting raw rows into _raw_github_api_pull_request_commits (500), [...] error inserting raw rows into _raw_github_api_pull_request_commits (500)"
    "progress": 0.17391305,
    "progressDetail": null,
    "failedSubTask": "collectApiPullRequestCommits",
    "pipelineId": 57,
    "pipelineRow": 6,
    "pipelineCol": 14,
    "beganAt": "2023-10-25T19:15:18.344Z",
    "finishedAt": "2023-10-25T19:19:15.655Z",
    "spentSeconds": 237
}

Full Task JSON (1.2MB): devlake-github-500.json

Screenshot: devlake-github-500-errors

What do you expect to happen

I expected this task to be successful.

How to reproduce

This is a good question, I am guessing with a repository with a significant amount of PRs and commits.

Anything else

Possibly related to https://github.com/apache/incubator-devlake/issues/6320

Version

v0.18.0

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

mkaufmaner avatar Oct 26 '23 11:10 mkaufmaner

Are error messages when you "hover to view the reason" the same with devlake-github-500.json?

Because I cann't find useful info from the json file(not your problem, the error messages are too mess). @mkaufmaner

d4x1 avatar Oct 26 '23 12:10 d4x1

Are error messages when you "hover to view the reason" the same with devlake-github-500.json?

Because I cann't find useful info from the json file(not your problem, the error messages are too mess).

@mkaufmaner

Yes, they are.

I am trying to get the logs but when the pod restarted with the updated nginx configuration the logs got blown away. Rerunning the task now to get the logs. Will probably take a few minutes.

mkaufmaner avatar Oct 26 '23 13:10 mkaufmaner

@d4x1 Updated the main bug report with the full log for the failed github task. Uncompressed they are 32.7MB. See https://github.com/apache/incubator-devlake/files/13178290/task-129911-2-1-github.log.zip

mkaufmaner avatar Oct 26 '23 13:10 mkaufmaner

@d4x1 Upon further investigation, it appears to be due to a resource limitation on our GHE server and the requests are timing out. Wish we could get graphql working, which we may get working after upgrading our GHE to the latest stable version of 3.10, but that is TBD.

Side note; is there a client side request timeout I should be worried about? I tried looking through the source code and documentation but couldn't find anything.

mkaufmaner avatar Oct 26 '23 14:10 mkaufmaner

The default timeout of GitHub graphql is 30s, but you can change it by setting an Env Var named API_TIMEOUT, keep in mind the s part is required, you can specify s for seconds, m for minutes and h for hours.

klesh avatar Oct 27 '23 09:10 klesh

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Dec 27 '23 00:12 github-actions[bot]

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Feb 27 '24 00:02 github-actions[bot]

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar May 01 '24 00:05 github-actions[bot]

This issue has been closed because it has been inactive for a long time. You can reopen it if you encounter the similar problem in the future.

github-actions[bot] avatar May 09 '24 00:05 github-actions[bot]

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Jul 13 '24 00:07 github-actions[bot]

Closed due to inactivity,

klesh avatar Sep 09 '24 09:09 klesh