policy-bot Feature: Include PR Body in Approval Candidates

Feature: Include PR Body in Approval Candidates

Open agirlnamedsophia opened this issue 2 years ago • 10 comments

Thank you for building such a great tool, and already accepting several contributions from my team. We love it so far!

Part of our workflow involves providing a way for PR authors to opt out of some review requirements. Historically we've done this by scanning the PR Body Description for specific comments. It would be great to add this feature to PolicyBot by including the PR Body in the API Response and appending it to the list of comments that are used for approval analysis.

I can see this as pairing well with the "allow_author" feature:

options:
  # If true, approvals by the author of a pull request are considered when
  # calculating the status. False by default.
  allow_author: true

This feature addition would mean the author doesn't have to add an extra "issue comment" after they've opened a PR with a specific comment type, they can just add it into the PR body description. This might also be useful to other people who use bots to create/merge PRs who don't need specific types of review.

I'd be happy to take a stab at implementing this feature and opening a PR if you agree that this would be valuable.

May 06 '22 16:05 agirlnamedsophia

Thanks for using and contributing to Policy Bot! Before you implement this feature, I think it would help to have an example policy and workflow where approval in the PR body would help. What you described so far sounds like something we typically use predicates for, so while we don't have a PR body predicate right now, it may make more sense to add this as a new predicate type than as an additional source of approval candidates.

For an example with predicates, consider a case where authors apply labels to indicate they're opting out of review:

policy:
  approval:
    - or:
        - reviewers have approved
        - pull request has the no-review label
        
approval_rules:
  - name: reviewers have approved
    requires:
      count: 1
      teams: ["palantir/reviewers"]

  - name: pull request has the no-review label
    if:
      has_labels:
        - "no-review"
    requires:
      count: 0 # indicates "auto-approval"

You can imagine combing this with other predicates as well: perhaps the author has to be part of a certain team or the pull request has to only change certain files. If labels don't work for you, maybe a new body predicate (similar to the title predicate) could work.

To summarize, instead of having many ways for people to self-approve, Policy Bot is designed to encode conditions like this in the policy by using predicates and the logical operators to enable rules that require no reviews. I'm having trouble thinking of situations where it makes more sense to self-approve as part of the initial PR.

But there could be valid workflows that don't 't fit this model or that I haven't thought about, so if you think that's the case here, I'd like to learn more!

May 09 '22 03:05 bluekeyes

So our use-case is primarily around "optional" opt-ing out by the author. We currently use an in-house-built-tool that relies on comments made only by the author for this kind of thing, but we're trying to go all in with PolicyBot and it would make the transition easier for our engineers if we can mirror (to some extent) the current workflow. This might be unique to us, but I can imagine that providing a way to optionally opt-in or opt-out of review while you're opening a pull request might be beneficial to other folks as well who use comments to communicate the kind of review they want or don't want.

For example, an author might be a member of the "ruby-experts" group but sometimes they don't want auto-approval - instead they'd prefer to have their Ruby code reviewed if it's extra complicated. So they would open a pull request and request a GitHub review from someone as normal. But other times their changeset might be a small version bump of a dependency and they feel confident to opt-out of review. By including the PR Body in the logic of determining approvedness, an author can include a simple comment in the PR Body description to indicate they are explicitly choosing not to have a reviewer for their PR.

This is of course possible to some extent with labels and PR titles, but these are also editable by someone who isn't the author, which doesn't suit our compliance needs. For comment-based PR reviews, it would be nice to include the body in the approval logic.

I think it could be reasonable for this to be a predicate similar to the title feature that matches on specific strings, but I'm concerned about our needs to ignore edits by anyone who isn't the author. I understand it probably seems strange to include the PR Body in the approval candidates considering it's not a comment of any kind in the API, but I think the "ignore edits" logic that exists for approval candidates supports our needs here as well.

If you prefer I include this addition as a predicate instead, perhaps there's an additional refactor to add an ability to ignore edits to any predicates as well? Happy to take a stab at implementing this as a predicate that looks at the PR body or as an addition to the approval candidates list.

Here's an example of what this might look like in a policy.yml if we include the PR body in the list of approval candidates:

policy:
  approval:
    - no-platform review
approval_rules:
- name: no-platform review
  if:
    has_author_in:
      teams:
      - "Betterment/platform-reviewers"
  requires:
    count: 1
    teams:
    - "Betterment/platform-reviewers"
  options:
    allow_author: true
    ignore_edited_comments: true
    invalidate_on_push: false
    methods:
      comment_patterns:
        - '/no-platform'

May 10 '22 17:05 agirlnamedsophia

Thanks for the additional context. After thinking about it some, I see two option for implementing this:

Option 1: Body Predicate

Add a new body predicate, with a property to control whether edits are allowed. This would be similar to the existing title predicate:

if:
  body:
    ignore_edits: true
    matches:
      - "/no-platform"

The ignore_edits property might be better with a different name, depending on exactly what condition it enforces.

Option 2: New Approval Method

This is close to what you initially proposed. However, I'd prefer not to change behavior for existing users by including pull request bodies in the approval candidates, so I think we should add a new approval method:

options:
  methods:
    body_patterns:
      - "/no-platform"

If this is set, it would consider the body as an approval candidate, respecting other options (e.g. it could consider the body a "comment" for the purpose of ignore_edited_comments). Note that this method would mostly be incompatible with invalidate_on_push.

I personally lean towards the first option, because I like to think of policies like this in terms of conditions granting automatic approval. But I can see how others may prefer to think of the same thing as self-approval, especially if coming from another system. I'm willing to support either variant (or both), so if you'd still like to contribute this feature, I leave the choice up to you.

Some things to watch out for in the implementation:

Detecting edits of the pull request body. I think the pull request API is just different enough from comments that the same updated vs created time check might not work.
Making sure everything responds properly to PR events. There's a chance that PR events to not set the right triggers, leading to skipped evaluations now that things will use the PR body.

May 13 '22 04:05 bluekeyes

I think adding support for both could be interesting. I'll most likely open two separate PRs to support both features, and will probably go with the new "body as comment" (option 2) first, since I think adding support for ignoring edits for a new predicate might be a little more complicated. I'm not as familiar with that aspect of the codebase quite yet.

Option 1 Follow up questions

For the body predicate, how do you imagine supporting the ignore_edits feature? It would be ideal to only ignore edits from someone who is NOT the author.

The graphql API exposes editor (https://docs.github.com/en/graphql/reference/objects#pullrequest) (the author login of someone who "edited" the PR Body) and lastEditedAt (the timestamp the editor made the last edit). Would the predicate Evaluate method look at the PR Body from the prctx (e.g. from Evaluate(ctx context.Context, prctx pull.Context)) and if ignore_edits is set to true, only return a satisfied result if there haven't been edits AND there is a string match? Otherwise only return satisfied if there is a string match?

Option 2 Follow up questions

If we want to coerce the PR Body into a "comment" then we should probably set a "fake" updatedAt timestamp conditionally based on the presence of lastEditedAt. What do you think?

// In github.go in loadPagedData()
// Add body to comment list as a comment-quacking duck
if q.Repository.PullRequest.LastEditedAt > 0 {
    updatedAt = q.Repository.PullRequest.LastEditedAt
} else {
    updatedAt = q.Repository.PullRequest.CreatedAt
}
prBody := &Comment{
    CreatedAt: q.Repository.PullRequest.CreatedAt,
    UpdatedAt: updatedAt, 
    Author:    q.Repository.PullRequest.Author.Login,
    Body:      q.Repository.PullRequest.Body,
}

As for invalidate_on_push - I agree that the addition of a new approval method would be incompatible with invalidate_on_push. Is there any additional code to you think to support that claim? Should I just add a test case for demonstrating this?

May 16 '22 18:05 agirlnamedsophia

Would the predicate Evaluate method look at the PR Body [...] and if ignore_edits is set to true, only return a satisfied result if there haven't been edits AND there is a string match? Otherwise only return satisfied if there is a string match?

Yes, that sounds right to me. It might be possible to get previous body content from the API, in which case ignore_edits set to true could mean string matching always happens against the initial body, but that sounds confusing and complicated to me.

In either case, we'll have decide on which definition of "edited" hits the right balance of compliance and implementation complexity:

If any edits exist, the body is edited. This is how ignore_edited_comments currently works and is simplest to implement.
If the last edit is by a user other than the original author, the body is edited. This should also be simple to implement but encodes the assumption that the author always validates the entire body content when making edits. Otherwise, you could have a situation where A writes the body, B edits it, then A edits it again, leaving B's edit intact without realizing it. This may or may not be fine in your situation.
If any edits are by a user other than the author, the body is edited. This avoids the issue in (2) but adds complexity and may not be fully supported by current GitHub APIs.

If we want to coerce the PR Body into a "comment" then we should probably set a "fake" updatedAt timestamp conditionally based on the presence of lastEditedAt. What do you think?

Yes, although you'll want to do this in policy/common/methods.go and coerce the body into a common.Candidate rather than a pull.Comment. Note that this subjects the body to the logic of ignore_edited_comments, so you may need to further fake the updatedAt value depending on the definition of "edited" that you use.

As for invalidate_on_push [...] is there any additional code to you think to support that claim? Should I just add a test case for demonstrating this?

It should "work" (or I guess, not work) without any changes and doesn't need a dedicated test. I think updating the docs to mention that these are incompatible (i.e. once you push a change, there is no way to "reset" the body such that it is considered for approval) is enough.

May 16 '22 20:05 bluekeyes

So it looks like Editor and LastEditedAt are only available via the graphql API, but it seems that a github.pullRequest struct is used to represent PR data, from the github Golang library, but that does NOT support these fields. Am I thinking about this correctly? Is there a way to create the github PR context from a mixture of graphql responses and RESTful responses?

May 19 '22 15:05 agirlnamedsophia

Yeah, this should be possible. All of the parts of the application that do evaluation use pull.Context and other types from the pull package instead of the go-github types directly. While pull.Context is constructed using a github.PullRequest, internally, it uses a type created from GraphQL (see v4PullRequest, Locator#toV4()).

The most direct way to support this would be to remove the concept of "completeness" (see Locator#IsComplete()), so that every PR is reloaded with GraphQL to get extra fields. I'm hesitant to do this because it will add an extra API request in all cases, even when the new editor information is unused.

That suggests an alternate way to implement this is to define a new pull.Context method to get the body editor details and then to make the GraphQL request as part of that method. This way, only policies that make use of the body as an approval method will make additional requests.

May 19 '22 19:05 bluekeyes

Still going to implement this! Just had COVID the past two weeks so took a little pause. Will get back to it this week.

May 31 '22 14:05 agirlnamedsophia

Hi! Just getting back into this. Just to clarify - do you mean create an identical method from https://github.com/palantir/policy-bot/blob/develop/pull/context.go#L44-L117 (just calling it something like ContextWithEditorInfo which would have attributes related to lastEditedAt, etc)? I'm not certain how/where to slot that in the codebase to conditionally return that interface vs a pull.Context - or rather, I'm not certain how to do that in a clean and simple way.

Jul 07 '22 18:07 agirlnamedsophia

I think you can add a new method to the existing pull.Context interface (and the two implementations), maybe called Body(), that returns either a new type or a pull.Comment struct. Calling that method makes the necessary GraphQL requests to lookup the editor information (and ideally caches the result; see some of the existing methods for examples.)

Then, when implementing the logic for handling body approval, you can call Body() on the pull.Context instance that should already be available.

Jul 07 '22 18:07 bluekeyes

this can be closed.

Sep 30 '22 19:09 devinburnette

Implemented in #454 and released as part of 1.27.0

Sep 30 '22 19:09 bluekeyes

policy-bot policy-bot copied to clipboard

Feature: Include PR Body in Approval Candidates

Option 1: Body Predicate

Option 2: New Approval Method

Option 1 Follow up questions

Option 2 Follow up questions

policy-bot
policy-bot copied to clipboard