[Bug][linker] LinkPrToIssue deletes all pull_request_issues entries from other projects
Search before asking
- [x] I had searched in the issues and found no similar issues.
What happened
Set Up: there are multiple projects set up in devlake with "Associate pull requests to issue" checked. The projects are on different git repos.
Collecting data from one project deletes all entries of pull_rquest_issues of other projects
What do you expect to happen
Data in pull_request_issues table of another project should not be touched.
How to reproduce
- Set up at least 2 projects in devlake with both jira and github
- Check Associate pull requests to issue
- Collect data from Project A
- Check data in pull_request_issues table -- all fine
- Collect data from Project B
- Check data in pull_request_issues table -- data for project B is correct but data from project A disappear.
Anything else
I think the issue is at https://github.com/apache/incubator-devlake/blob/v1.0.1/backend/plugins/linker/tasks/link_pr_and_issue.go#L48-L60
DELETE FROM pull_request_issues WHERE pull_request_id IN ( SELECT pr.id FROM pull_requests pr LEFT JOIN project_mapping pm ON pm.table = 'repos' AND pm.row_id = pr.base_repo_id AND pm.project_name = ? )
It should be join or if left join there should be a where clause to filter out a project
Version
v1.0.1
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
You are correct in your analysis: the current SQL DELETE statement in link_pr_and_issue.go is likely causing the issue where collecting data for one project deletes all pull_request_issues entries for other projects. The relevant code is as follows:
DELETE FROM pull_request_issues
WHERE pull_request_id IN (
SELECT pr.id
FROM pull_requests pr
LEFT JOIN project_mapping pm
ON pm.table = 'repos'
AND pm.row_id = pr.base_repo_id
AND pm.project_name = ?
)
This query uses a LEFT JOIN and filters by pm.project_name = ?, but because it's a LEFT JOIN without a corresponding WHERE clause to ensure only the intended project's records are affected, it can result in deleting entries from other projects as well. This matches your observation and the described behavior.
No similar issues or discussions were found in the repository, and there are no recent changes addressing this in the codebase since v1.0.1.
Suggested approach:
To fix this, the query should ensure that only records associated with the specific project are deleted. You can achieve this by using an INNER JOIN or by adding a WHERE clause to the subquery to strictly filter by the project. For example:
DELETE FROM pull_request_issues
WHERE pull_request_id IN (
SELECT pr.id
FROM pull_requests pr
INNER JOIN project_mapping pm
ON pm.table = 'repos'
AND pm.row_id = pr.base_repo_id
WHERE pm.project_name = ?
)
This change ensures that only pull_request_issues linked to pull requests belonging to the specified project are deleted, preventing data loss from other projects.
You can find the relevant code and logic in the clearHistoryData function in link_pr_and_issue.go.
Let me know if you need further details or guidance on implementing this change.
To continue talking to Dosu, mention @dosu-bot.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.
This issue has been closed because it has been inactive for a long time. You can reopen it if you encounter the similar problem in the future.