sentry icon indicating copy to clipboard operation
sentry copied to clipboard

feat(suspect-resolutions): Scan all issues in a project for a suspect resolution

Open komal-saini opened this issue 3 years ago • 4 comments

Context

Similar to suspect commits, we want to design an algorithm to suggest suspect resolutions given an issue that has been resolved. To determine a potential suspect resolution, we are comparing the commit data (commit_correlation.py) and error-rate data (metric_correlation.py) between a resolved issue and a candidate suspect resolution.

In order to evaluate the success of our algorithm, we want to calculate the percentage of true and false positives, as well as the percentage of true and false negatives. Each time an issue is resolved, we want to fire an analytics event to track the candidate suspect resolutions and actual suspect resolutions identified by the algorithm.

Read more about the project here.

Changes

metric_correlation.py Metric data is correlated by checking if the error-rates of the resolved issue and the candidate suspect resolution are correlated 8h before and after the time of resolution using the Pearson correlation coefficient. If there’s a drop in errors for both issues as a result of a release that has been deployed, we can draw conclusions on whether the resolution of one issue can result in the resolution of the candidate issue.

  • is_issue_error_rate_correlated() takes a resolved_issue and a candidate_suspect_resolution and returns whether the value of the Pearson correlation coefficient is above 0.4 (indicating a strong positive association between the issues)
  • calculate_pearson_correlation_coefficient() takes two lists of integers and calculates their Pearson correlation coefficient, returns an integer value that is indicative of their strength of correlation

commit_correlation.py [merged] Commit data is correlated by checking if any of the files changed by the resolved issue are also changed by the candidate suspect resolution.

  • is_issue_commit_correlated() takes a resolved_issue_id and a candidate_issue_id and returns whether they share changed files
  • get_files_changed() takes in an issue_id and a project_id and returns a set of all the files changed across all the releases of the given issue_id

get_suspect_resolutions.py This function calls is_issue_commit_correlated() and is_issue_error_rate_correlated() to determine suspect resolutions.

  • get_suspect_resolutions() takes a resolved_issue and returns a list of correlated issue_ids

resolved_in_active_release.py

  • is_resolved_issue_within_active_release() takes an issue and a project and returns whether the release associated with the issue was deployed within the last hour

analytics.py

  • Added two new events: suspect_resolution.ids and candidate_group.ids to track all the issues in a given project, and issues that were picked up as suspect resolutions

Tests

test_metric_correlation.py

  • test_correlated_issues(): Tests is_issue_error_rate_correlated() on two issues that experience a drop in the number of events at around the same time
  • test_uncorrelated_issues(): Tests is_issue_error_rate_correlated() on two issues that are unrelated
  • test_perfect_correlation(): Tests is_issue_error_rate_correlated() on two issues that with the exact same time-series data
  • test_custom_calculation_against_pearsonr(): Tests that calculate_pearson_correlation_coefficient() returns the same Pearson correlation coefficient r-value as scipy.stats.pearsonr()

test_commit_correlation.py [merged]

  • test_get_files_changed(): Check if get_files_changed() returns all changed files
  • test_is_issue_commit_correlated_with_shared_files(): Check if is_issue_commit_correlated() returns True given two issues that share changed files
  • test_is_issue_commit_correlated_no_shared_files(): Check if is_issue_commit_correlated() returns False given two issues that do not share changed files

test_resolved_in_active_release.py

  • test_resolved_issue_in_active_release(): Checks that is_resolved_issue_within_active_release() returns True given an issue that is resolved within an active release, and False otherwise
  • test_unresolved_issue_in_active_release(): Checks that is_resolved_issue_within_active_release() returns False given an issue that has not been resolved
  • test_resolved_issue_in_old_deploy(): Checks that is_resolved_issue_within_active_release() returns False given an issue that was resolved in a release that was deployed more than 1 hour ago
  • test_resolved_issue_in_active_release_not_deployed(): Checks that is_resolved_issue_within_active_release() returns False given an issue that was resolved in a release that has not been deployed

test_get_suspect_resolutions.py

  • test_get_suspect_resolutions(): Tests get_suspect_resolutions() on two issues that are correlated by both commit data and metric data
  • test_get_suspect_resolutions_issue_not_resolved(): Tests get_suspect_resolutions() on an unresolved issue and on an ignored issue

komal-saini avatar Jul 20 '22 18:07 komal-saini

Copied over the files associated with metric/commit correlation for now since they haven't been merged in yet

Hmm, instead of doing this, what you can do is create a branch based off your komal/integrate-into-sentry branch (I assume this has the most up-to-date changes that are under review.

So something like this: komal/scan-all-project-issues merge into komal/integrate-into-sentry, instead of having komal/suspect-resolutions be the base for all your PRs.

I find the most useful ways of using PRs is the diffs between branches and segmenting related bits of code to be digested by the PR reviewers. If you have to copy and paste files from different branches, I would tend to think thats a code smell (or workflow smell lol) and try to find a different approach.

Notice how this PR diff includes a bunch of un-related file changes since komal/suspect-resolutions branch is waiting PRs to complete.

Could you change the target merge branch from komal/suspect-resolutions to komal/integrate-into-sentry

barkbarkimashark avatar Jul 20 '22 22:07 barkbarkimashark

Didn't review everything, but I think its in a good spot to merge.

We should try to get a good experiment workflow going in prod to verify some of our assumptions by mid next week.

barkbarkimashark avatar Aug 02 '22 23:08 barkbarkimashark

@komal-saini This is ready to go from my point of view, just want to make sure that this isn't called from anywhere until https://github.com/getsentry/sentry/pull/37378 merges?

wedamija avatar Aug 05 '22 23:08 wedamija

@komal-saini This is ready to go from my point of view, just want to make sure that this isn't called from anywhere until #37378 merges?

@wedamija Yeah, that's the only place it's being called. I'm just waiting on getting access to reload/etl to add the analytics event and then will merge this in.

komal-saini avatar Aug 06 '22 00:08 komal-saini