[Bug][cicd_pipelines] Fix pipelines with Github
Search before asking
- [X] I had searched in the issues and found no similar issues.
What happened
Hi @here
Guys a question since I am currently seeing an error in devlake that is not bringing me all the pipelines that are running or have run in my repo. The account in github enterprise In github it tells me that I have 384 workflow runs but in devlake in the cicd_pipeline table it only brings me 289 records.
Has anyone had this error?
What do you expect to happen
Have all pipelines that visualize in Gtihub
How to reproduce
You can try to run this command in your database SELECT * FROM cicd_pipelines and show that the number is different.
Anything else
No response
Version
v.18.0,v.19.0,v.20.0
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Hello @daviddsp! I'm here to help you with any bugs, questions, or contributions you may have. Let's work together to solve the issue you're experiencing.
Could you provide more details on the following:
- Configuration Settings: Are there any specific configurations or filters set up in DevLake that might limit the data being retrieved from GitHub Enterprise?
- Sync Logs: Are there any error logs or warnings in the DevLake sync logs that might indicate issues during data retrieval?
- Data Range: Are the missing pipeline records within a specific date range or randomly distributed?
- API Rate Limits: Have you checked if there are any API rate limits being hit during the data synchronization process?
- Repository Access: Does the DevLake instance have full access to the repository in GitHub Enterprise, including all workflow runs?
These details will help in diagnosing the issue more accurately.
To continue talking to Dosu, mention @dosu.
Please check the sync policy setting first.
@klesh I checked the policy but I don't see anything stranger, attach my policy configuration.
Hi, @daviddsp , are all those 384 workflows created AFTER 2024-01-01?
Yes @klesh these workflows ran in April current year. I even tried from the GitHub API and it brings me all the workflows without any problem.
Yes @klesh these workflows ran in April current year. I even tried from the GitHub API and it brings me all the workflows without any problem.
@daviddsp Can you check the raw and tool layer tables' data? They should have all workflows (384 in your case).
Yes @klesh these workflows ran in April current year. I even tried from the GitHub API and it brings me all the workflows without any problem.
@daviddsp Can you check the raw and tool layer tables' data? They should have all workflows (384 in your case).
@d4x1 Yes, I checked table _raw_github_api_runs, and doesn't this total of pipelines that return Github API, I tried to test this in different Devlake versions, and this lost data in v20.0 and v.21.0, test with versions 18.0 and 17.0, and works perfectly.
@daviddsp Have you tried latest version(v1.0.x)?
@daviddsp Have you tried latest version(v1.0.x)? <img width="1723" alt="Screenshot 20
24-07-22 at 3 47 57 PM" src="https://github.com/user-attachments/assets/79409daf-9c8
0-4b58-998d-83aaa61f9395">
Today, I tested this version and showed the same error, attached are images of references.
@daviddsp Thanks for your feedback. We have two guesses:
- Some workflow runs are not
completedin your repo, because codes https://github.com/apache/incubator-devlake/blob/main/backend/plugins/github/tasks/cicd_run_collector.go#L97 which were added since v0.20 filter workflow runs, make sure they muse becompleted. You can check yourFULLdata table, make sure whether it has record with other status(not completed). - Your instance is upgraded from an old version and there is something wrong with the incremental collect mode.
You can collect your project in
FULLmode, and see what'll happen.
Looking forward your reply.
@d4x1, thank you very much. I really want to resolve this issue.
I reviewed the code in this specific version of DevLake and understand that this filter only includes actions with a "completed" status. I have tested this and will share a new screenshot when the table SELECT * FROM _tool_github_runs; only contains records with a "completed" status.
For Slack, I recommend running this query. It is quite strange because it does not return any records with a status other than "completed" in the table _raw_github_api_runs. Perhaps the change you mentioned has been implemented, resulting in the deletion of pipelines with statuses other than "completed".
And in always delete my database when change to other version of Devlake for just to avoid compatibility problems.
Now in github have these total of workflows, i will separate for status:
- waiting: 1.
- completed: 1.334.
- failure: 470.
- cancelled: 16.
Yes @klesh these workflows ran in April current year. I even tried from the GitHub API and it brings me all the workflows without any problem.
@daviddsp Can you check the raw and tool layer tables' data? They should have all workflows (384 in your case).
@d4x1 Yes, I checked table _raw_github_api_runs, and doesn't this total of pipelines that return Github API, I tried to test this in different Devlake versions, and this lost data in v20.0 and v.21.0, test with versions 18.0 and 17.0, and works perfectly.
Sorry? What was the number of total records in the _raw_github_api_runs for v0.20 and 0.21?
@d4x1, thank you very much. I really want to resolve this issue.
I reviewed the code in this specific version of DevLake and understand that this filter only includes actions with a "completed" status. I have tested this and will share a new screenshot when the table
SELECT * FROM _tool_github_runs;only contains records with a "completed" status.For Slack, I recommend running this query. It is quite strange because it does not return any records with a status other than "completed" in the table
_raw_github_api_runs. Perhaps the change you mentioned has been implemented, resulting in the deletion of pipelines with statuses other than "completed".And in always delete my database when change to other version of Devlake for just to avoid compatibility problems.
Now in github have these total of workflows, i will separate for status:
- waiting: 1.
- completed: 1.334.
- failure: 470.
- cancelled: 16.
I don't understand the figures here.
If all those missing runs were NOT completed records such as pending, waiting for approval, then the result of their being deleted is expected.
because it has no point to analyze them.
Hi @daviddsp a new version has been released https://github.com/apache/incubator-devlake/releases/tag/v1.0.1-beta5 . Would you test it by taking a look at the log to confirm if the missing runs were skipped?
@klesh Perfect, I will test this version and send you feedback.
@klesh I just tested this version and if the problem was indeed solved, it brings me the total number of pipelines that github shows me on the web, what was the problem?
@daviddsp I didn't do any but simply added a log printing when a run gets skipped 😂https://github.com/apache/incubator-devlake/pull/7818/files