skara icon indicating copy to clipboard operation
skara copied to clipboard

1532: CSRBot is too inefficient

Open erikj79 opened this issue 2 years ago • 2 comments

This patch is a pretty major redesign of how the CSRBot polls for work. The motivation for this is described in the bug description. It's quite big, so I will try to break down the major changes.

  1. The CSRBot has been split into CSRIssueBot and CSRPullRequestBot. The issue bot polls the issue tracker (JBS) for work, while the pull request bot polls the forge (github). The WorkItem part of the old CSRBot has been moved to the new PullRequestWorkItem, and except for now only handling one single PR per WorkItem instance (instead of everything in a repo), has been left pretty much intact. All actual updates on PRs are (still) handled by this WorkItem. The new IssueWorkItem is created when a change to a CSR Issue has been detected (in JBS). It walks the JBS issue and PR links to identify all PRs that could possibly be related to that CSR, and creates PullRequestWorkItems for them.

  2. To be able to find PRs from JBS issues, the CSRBot needs to know how to parse the PR comment links. To better share this knowledge, I moved the creation and deletion logic from the NotifyBot to PullRequestUtils (in the forge module), which is available to all bots that need it. I also added a method for parsing the URL from such a comment.

  3. To efficiently query for updated CSR issues from Jira, I added two new query methods on IssueProject. A longer explanation of the new polling mechanism for issues can be found in the first bug comment. I've also tried to reasonably explain things in comments in the code.

  4. In JiraProject, there was a pretty serious bug in all query methods for Issues. The JiraIssue constructor takes a RestRequest as a parameter. This request object is supposed to be used by JiraIssue for generating all the subqueries for the Jira REST API. The problem was that the query methods in JiraProject just sent in whatever RestRequest they used to find issues, which broke all methods that tried to make further calls through the returned JiraIssue to Jira. I fixed this by always supplying a RestRequest object with the correct URL when creating a JiraIssue.

  5. To be able to call Jira with absolute timestamps in queries, we need to know the timezone of the user we are calling as. Luckily this information was available with a simple call, so I added this to the JiraHost class.

  6. Similar to how care needs to be taken when polling for updates to issues in Jira, I applied a similar solution when polling PRs. This required a new method HostedRepository.openPullRequestsAfter. Unfortunately, it doesn't help much on Github, where the API doesn't support such a parameter, but it works for Gitlab.

I have modified the tests that are affected by this change so that they still pass. This does verify that changes to issues in the IssueTracker are detected. Perhaps I should add some new tests to specifically verify the polling logic. I have verified the new queries and complete functionality manually by running the bot against the playground repo and bugs-stage.


Progress

  • [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace

Issue

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/skara pull/1357/head:pull/1357
$ git checkout pull/1357

Update a local copy of the PR:
$ git checkout pull/1357
$ git pull https://git.openjdk.org/skara pull/1357/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1357

View PR using the GUI difftool:
$ git pr show -t 1357

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/skara/pull/1357.diff

erikj79 avatar Aug 12 '22 22:08 erikj79

:wave: Welcome back erikj! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Aug 12 '22 22:08 bridgekeeper[bot]

Webrevs

mlbridge[bot] avatar Aug 12 '22 22:08 mlbridge[bot]

I don't really know whether it is a good idea to has both CSRIssueBot and CSRPullRequestBot. It seems that one CSRBot contains two work items, IssueWorkItem and PullRequestWorkItem, works well too.

The reason for having two separate bots is that the CSRPullRequestBot has one instance for each repo, while the CSRIssueBot has one instance for each IssueProject. There are generally multiple repos that share the same IssueProject. It would be possible to have one single CSRBot for each IssueProject and just loop over all the repos in it. I chose to split them to keep each "bot" more focused on polling one kind of thing. Compare with the existing MailingListArchiveReaderBot and MailingListBridgeBot.

Or we can seperate the CSRIssueBot to a new module, like IssueBot, to be reused in the future. But it may exceed the range of this patch.

A separate module would be tricky as we need WorkItems that are specialized for CSR, but those wouldn't be accessible in the other module. I do agree that part of the implementation here would be nice to be able to reuse, but until we know how some other bot would like to reuse it, I would rather not complicate the current patch further.

erikj79 avatar Aug 15 '22 16:08 erikj79

I don't really know whether it is a good idea to has both CSRIssueBot and CSRPullRequestBot. It seems that one CSRBot contains two work items, IssueWorkItem and PullRequestWorkItem, works well too.

The reason for having two separate bots is that the CSRPullRequestBot has one instance for each repo, while the CSRIssueBot has one instance for each IssueProject. There are generally multiple repos that share the same IssueProject. It would be possible to have one single CSRBot for each IssueProject and just loop over all the repos in it. I chose to split them to keep each "bot" more focused on polling one kind of thing. Compare with the existing MailingListArchiveReaderBot and MailingListBridgeBot.

Or we can seperate the CSRIssueBot to a new module, like IssueBot, to be reused in the future. But it may exceed the range of this patch.

A separate module would be tricky as we need WorkItems that are specialized for CSR, but those wouldn't be accessible in the other module. I do agree that part of the implementation here would be nice to be able to reuse, but until we know how some other bot would like to reuse it, I would rather not complicate the current patch further.

Also, thank you for taking the time to review this!

erikj79 avatar Aug 15 '22 17:08 erikj79

I don't really know whether it is a good idea to has both CSRIssueBot and CSRPullRequestBot. It seems that one CSRBot contains two work items, IssueWorkItem and PullRequestWorkItem, works well too.

The reason for having two separate bots is that the CSRPullRequestBot has one instance for each repo, while the CSRIssueBot has one instance for each IssueProject. There are generally multiple repos that share the same IssueProject. It would be possible to have one single CSRBot for each IssueProject and just loop over all the repos in it. I chose to split them to keep each "bot" more focused on polling one kind of thing. Compare with the existing MailingListArchiveReaderBot and MailingListBridgeBot.

Or we can seperate the CSRIssueBot to a new module, like IssueBot, to be reused in the future. But it may exceed the range of this patch.

A separate module would be tricky as we need WorkItems that are specialized for CSR, but those wouldn't be accessible in the other module. I do agree that part of the implementation here would be nice to be able to reuse, but until we know how some other bot would like to reuse it, I would rather not complicate the current patch further.

Looks good to keep these two bots. I agree that the issue bot may be considered to be reused in the future, but not in this patch.

lgxbslgx avatar Aug 17 '22 06:08 lgxbslgx

@erikj79 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

:mag: One or more changes in this pull request modifies files in areas of the source code that often require two reviewers. Please consider if this is the case for this pull request, and if so, await a second reviewer to approve this pull request before you integrate it.

After integration, the commit message for the final commit will be:

1532: CSRBot is too inefficient

Reviewed-by: gli, ihse

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar Aug 18 '22 11:08 openjdk[bot]

Thanks!

/integrate

erikj79 avatar Aug 19 '22 16:08 erikj79

Going to push as commit 972cbb685e9406ec9d245fd201e4182f7a2c0a6f. Since your change was applied there has been 1 commit pushed to the master branch:

  • f1227d0cbb6ed47b6e6d0c411697fd7b008e283e: 1355: Mirror bot needs branch selection based on patterns

Your commit was automatically rebased without conflicts.

openjdk[bot] avatar Aug 19 '22 16:08 openjdk[bot]

@erikj79 Pushed as commit 972cbb685e9406ec9d245fd201e4182f7a2c0a6f.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk[bot] avatar Aug 19 '22 16:08 openjdk[bot]