aqa-test-tools icon indicating copy to clipboard operation
aqa-test-tools copied to clipboard

Verify sensitivy of Glitchwitcher REPD approach by inserting mutations into draft PRs

Open smlambert opened this issue 4 months ago • 1 comments

Now that there is a way to run the REPD approach against a set of changed files (in a PR), let us also check how 'sensitive' this approach is to different types of mutations and code changes that we orchestrate.

Let's select various files from the OpenJ9/OpenJDK repos and make our own "bad changes" by applying several mutators from this list: https://pitest.org/quickstart/mutators/ to see what % score the draft PR receives. pitest generates mutations for Java code, but many of these mutations can just be manually applied to any codebase. One could also create a draft PR that applies several mutations across several files.

Mutator(s) File(s) mutated Resulting % score
Increments Mutator https://github.com/eclipse-openj9/openj9/blob/master/runtime/gc_base/ContinuationObjectList.cpp#L58 TBD
Negate Conditionals Mutator https://github.com/eclipse-openj9/openj9/blob/master/runtime/j9vm/javanextvmi.cpp#L228 TBD
... ... ...

smlambert avatar Aug 22 '25 16:08 smlambert

TL;DR

  • I injected a range of PIT-style mutations (boundary, logical, math, returns, side-effect removal) into anirudhsengar/OpenJ9 via PRs, then ran the REPD approach against those PRs.
  • Mutations that short-circuit behavior or remove side effects (e.g., Empty returns, Void method call removal, forcing returns) cause the largest score increases, meaning REPD is most sensitive to structural/semantic breakages that obviously degrade correctness and resource handling.
  • Subtle control-flow adjustments (boundary flips, increments, negations, math tweaks) move the needle only slightly.

Results

Mutator Files Changed Average Defective % Change Average Non-Defective % Change
Conditionals Boundary Mutator https://github.com/anirudhsengar/openj9/pull/2/files -0.024% -0.037%
Increments Mutator https://github.com/anirudhsengar/openj9/pull/3/files +0.00% +0.00%
Negatives Mutator https://github.com/anirudhsengar/openj9/pull/4/files +0.077% +0.138%
Math Mutator https://github.com/anirudhsengar/openj9/pull/5/files -0.059% -0.106%
Negate Conditionals Mutator https://github.com/anirudhsengar/openj9/pull/6/files +0.019% +0.028%
Return Values Mutator https://github.com/anirudhsengar/openj9/pull/7/files +1.774% +2.913%
Void Method Call Mutator https://github.com/anirudhsengar/openj9/pull/8/files +11.536% +18.082%
Empty returns Mutator https://github.com/anirudhsengar/openj9/pull/9/files +21.191% +55.363%
False returns Mutator https://github.com/anirudhsengar/openj9/pull/10/files +0.067% +0.111%
True returns Mutator https://github.com/anirudhsengar/openj9/pull/11/files +3.667% +6.459%
Null returns Mutator https://github.com/anirudhsengar/openj9/pull/12/files +3.831% +5.674%
Primitive returns Mutator https://github.com/anirudhsengar/openj9/pull/13/files +5.378% +8.889%

Notes:

  • “Defective/Non-Defective % Change” reflects how the REPD score moved on those classes after the mutation relative to baseline on the same files.

What the results suggest about REPD sensitivity

  1. Biggest signals: removing behavior and short-circuiting flow
  • Empty returns (+21.19% / +55.36%): Early-exiting methods (e.g., returning NULL/nullptr or returning prematurely) produces large, consistent structural damage - REPD strongly flags these.
  • Void Method Call removal (+11.54% / +18.08%): Eliminating calls that have side effects (e.g., cleanup/close, permission checks, tracepoints, synchronization) materially alters program semantics. REPD consistently treats this as high risk.
  • Forced return constants (True/Null/Primitive returns; Return Values): These increasingly “freeze” dynamic paths and error handling. The higher the probability of suppressing failures or misreporting state, the larger the REPD bump (up to +8.89% for primitive returns).
  1. Moderate signals: blatant control-flow forcing
  • True/False returns (esp. True returns), and Null/Primitive return families: These mutations steer code into atypical paths or failure modes (e.g., reporting success, skipping checks, returning invalid pointers), which REPD catches as materially riskier than baseline.
  1. Small/no signal: micro-control-flow and arithmetic tweaks
  • Conditionals boundary, increments, negate conditionals, negatives, math. REPD tends to treat them as low-risk noise, hence tiny score deltas.

Methodology (high level)

  • For each mutator, I created a PR in my OpenJ9 fork that applied representative mutations across 10 files.
  • I then ran the REPD approach against each PR to compute how its scoring changed on the affected classes compared to baseline.
  • I aggregated the deltas to the “Average % Change” values shown above.

Takeaway

  • REPD is most sensitive to mutations that remove behavior or force outcomes (empty/void-return changes, hardcoded returns).

anirudhsengar avatar Aug 27 '25 15:08 anirudhsengar