incubator-devlake icon indicating copy to clipboard operation
incubator-devlake copied to clipboard

[Bug][cicd_pipelines] Fix pipelines with Github

Open daviddsp opened this issue 1 year ago • 16 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

What happened

Hi @here Guys a question since I am currently seeing an error in devlake that is not bringing me all the pipelines that are running or have run in my repo. The account in github enterprise In github it tells me that I have 384 workflow runs but in devlake in the cicd_pipeline table it only brings me 289 records. Has anyone had this error? image-2 image-3

What do you expect to happen

Have all pipelines that visualize in Gtihub

How to reproduce

You can try to run this command in your database SELECT * FROM cicd_pipelines and show that the number is different.

Anything else

No response

Version

v.18.0,v.19.0,v.20.0

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

daviddsp avatar Jul 15 '24 19:07 daviddsp

Hello @daviddsp! I'm here to help you with any bugs, questions, or contributions you may have. Let's work together to solve the issue you're experiencing.

Could you provide more details on the following:

  1. Configuration Settings: Are there any specific configurations or filters set up in DevLake that might limit the data being retrieved from GitHub Enterprise?
  2. Sync Logs: Are there any error logs or warnings in the DevLake sync logs that might indicate issues during data retrieval?
  3. Data Range: Are the missing pipeline records within a specific date range or randomly distributed?
  4. API Rate Limits: Have you checked if there are any API rate limits being hit during the data synchronization process?
  5. Repository Access: Does the DevLake instance have full access to the repository in GitHub Enterprise, including all workflow runs?

These details will help in diagnosing the issue more accurately.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Jul 15 '24 19:07 dosubot[bot]

Please check the sync policy setting first.

klesh avatar Jul 16 '24 03:07 klesh

@klesh I checked the policy but I don't see anything stranger, attach my policy configuration. image

daviddsp avatar Jul 17 '24 12:07 daviddsp

Hi, @daviddsp , are all those 384 workflows created AFTER 2024-01-01?

klesh avatar Jul 18 '24 02:07 klesh

Yes @klesh these workflows ran in April current year. I even tried from the GitHub API and it brings me all the workflows without any problem.

daviddsp avatar Jul 18 '24 13:07 daviddsp

Yes @klesh these workflows ran in April current year. I even tried from the GitHub API and it brings me all the workflows without any problem.

@daviddsp Can you check the raw and tool layer tables' data? They should have all workflows (384 in your case).

d4x1 avatar Jul 20 '24 03:07 d4x1

Yes @klesh these workflows ran in April current year. I even tried from the GitHub API and it brings me all the workflows without any problem.

@daviddsp Can you check the raw and tool layer tables' data? They should have all workflows (384 in your case).

@d4x1 Yes, I checked table _raw_github_api_runs, and doesn't this total of pipelines that return Github API, I tried to test this in different Devlake versions, and this lost data in v20.0 and v.21.0, test with versions 18.0 and 17.0, and works perfectly.

daviddsp avatar Jul 20 '24 04:07 daviddsp

@daviddsp Have you tried latest version(v1.0.x)?

d4x1 avatar Jul 22 '24 02:07 d4x1

@daviddsp Have you tried latest version(v1.0.x)? <img width="1723" alt="Screenshot 20 Screenshot 2024-07-22 at 3 48 18 PM

24-07-22 at 3 47 57 PM" src="https://github.com/user-attachments/assets/79409daf-9c8 Screenshot 2024-07-22 at 3 49 00 PM0-4b58-998d-83aaa61f9395">

Today, I tested this version and showed the same error, attached are images of references.

daviddsp avatar Jul 22 '24 19:07 daviddsp

@daviddsp Thanks for your feedback. We have two guesses:

  1. Some workflow runs are not completed in your repo, because codes https://github.com/apache/incubator-devlake/blob/main/backend/plugins/github/tasks/cicd_run_collector.go#L97 which were added since v0.20 filter workflow runs, make sure they muse be completed. You can check your FULL data table, make sure whether it has record with other status(not completed).
  2. Your instance is upgraded from an old version and there is something wrong with the incremental collect mode. You can collect your project in FULL mode, and see what'll happen.

Looking forward your reply.

d4x1 avatar Jul 23 '24 06:07 d4x1

@d4x1, thank you very much. I really want to resolve this issue.

I reviewed the code in this specific version of DevLake and understand that this filter only includes actions with a "completed" status. I have tested this and will share a new screenshot when the table SELECT * FROM _tool_github_runs; only contains records with a "completed" status.

For Slack, I recommend running this query. It is quite strange because it does not return any records with a status other than "completed" in the table _raw_github_api_runs. Perhaps the change you mentioned has been implemented, resulting in the deletion of pipelines with statuses other than "completed".

And in always delete my database when change to other version of Devlake for just to avoid compatibility problems.

Now in github have these total of workflows, i will separate for status:

  • waiting: 1.
  • completed: 1.334.
  • failure: 470.
  • cancelled: 16.
image image

daviddsp avatar Jul 23 '24 17:07 daviddsp

Yes @klesh these workflows ran in April current year. I even tried from the GitHub API and it brings me all the workflows without any problem.

@daviddsp Can you check the raw and tool layer tables' data? They should have all workflows (384 in your case).

@d4x1 Yes, I checked table _raw_github_api_runs, and doesn't this total of pipelines that return Github API, I tried to test this in different Devlake versions, and this lost data in v20.0 and v.21.0, test with versions 18.0 and 17.0, and works perfectly.

Sorry? What was the number of total records in the _raw_github_api_runs for v0.20 and 0.21?

@d4x1, thank you very much. I really want to resolve this issue.

I reviewed the code in this specific version of DevLake and understand that this filter only includes actions with a "completed" status. I have tested this and will share a new screenshot when the table SELECT * FROM _tool_github_runs; only contains records with a "completed" status.

For Slack, I recommend running this query. It is quite strange because it does not return any records with a status other than "completed" in the table _raw_github_api_runs. Perhaps the change you mentioned has been implemented, resulting in the deletion of pipelines with statuses other than "completed".

And in always delete my database when change to other version of Devlake for just to avoid compatibility problems.

Now in github have these total of workflows, i will separate for status:

  • waiting: 1.
  • completed: 1.334.
  • failure: 470.
  • cancelled: 16.

I don't understand the figures here.

If all those missing runs were NOT completed records such as pending, waiting for approval, then the result of their being deleted is expected. because it has no point to analyze them.

klesh avatar Jul 31 '24 08:07 klesh

Hi @daviddsp a new version has been released https://github.com/apache/incubator-devlake/releases/tag/v1.0.1-beta5 . Would you test it by taking a look at the log to confirm if the missing runs were skipped?

klesh avatar Aug 02 '24 01:08 klesh

@klesh Perfect, I will test this version and send you feedback.

daviddsp avatar Aug 03 '24 21:08 daviddsp

@klesh I just tested this version and if the problem was indeed solved, it brings me the total number of pipelines that github shows me on the web, what was the problem?

image image

daviddsp avatar Aug 03 '24 22:08 daviddsp

@daviddsp I didn't do any but simply added a log printing when a run gets skipped 😂https://github.com/apache/incubator-devlake/pull/7818/files

klesh avatar Aug 05 '24 11:08 klesh