Verify sensitivy of Glitchwitcher REPD approach by inserting mutations into draft PRs

Open smlambert opened this issue 4 months ago • 1 comments

Now that there is a way to run the REPD approach against a set of changed files (in a PR), let us also check how 'sensitive' this approach is to different types of mutations and code changes that we orchestrate.

Let's select various files from the OpenJ9/OpenJDK repos and make our own "bad changes" by applying several mutators from this list: https://pitest.org/quickstart/mutators/ to see what % score the draft PR receives. pitest generates mutations for Java code, but many of these mutations can just be manually applied to any codebase. One could also create a draft PR that applies several mutations across several files.

Mutator(s)	File(s) mutated	Resulting % score
Increments Mutator	https://github.com/eclipse-openj9/openj9/blob/master/runtime/gc_base/ContinuationObjectList.cpp#L58	TBD
Negate Conditionals Mutator	https://github.com/eclipse-openj9/openj9/blob/master/runtime/j9vm/javanextvmi.cpp#L228	TBD
...	...	...

Aug 22 '25 16:08 smlambert

TL;DR

I injected a range of PIT-style mutations (boundary, logical, math, returns, side-effect removal) into anirudhsengar/OpenJ9 via PRs, then ran the REPD approach against those PRs.
Mutations that short-circuit behavior or remove side effects (e.g., Empty returns, Void method call removal, forcing returns) cause the largest score increases, meaning REPD is most sensitive to structural/semantic breakages that obviously degrade correctness and resource handling.
Subtle control-flow adjustments (boundary flips, increments, negations, math tweaks) move the needle only slightly.

Results

Mutator	Files Changed	Average Defective % Change	Average Non-Defective % Change
Conditionals Boundary Mutator	https://github.com/anirudhsengar/openj9/pull/2/files	-0.024%	-0.037%
Increments Mutator	https://github.com/anirudhsengar/openj9/pull/3/files	+0.00%	+0.00%
Negatives Mutator	https://github.com/anirudhsengar/openj9/pull/4/files	+0.077%	+0.138%
Math Mutator	https://github.com/anirudhsengar/openj9/pull/5/files	-0.059%	-0.106%
Negate Conditionals Mutator	https://github.com/anirudhsengar/openj9/pull/6/files	+0.019%	+0.028%
Return Values Mutator	https://github.com/anirudhsengar/openj9/pull/7/files	+1.774%	+2.913%
Void Method Call Mutator	https://github.com/anirudhsengar/openj9/pull/8/files	+11.536%	+18.082%
Empty returns Mutator	https://github.com/anirudhsengar/openj9/pull/9/files	+21.191%	+55.363%
False returns Mutator	https://github.com/anirudhsengar/openj9/pull/10/files	+0.067%	+0.111%
True returns Mutator	https://github.com/anirudhsengar/openj9/pull/11/files	+3.667%	+6.459%
Null returns Mutator	https://github.com/anirudhsengar/openj9/pull/12/files	+3.831%	+5.674%
Primitive returns Mutator	https://github.com/anirudhsengar/openj9/pull/13/files	+5.378%	+8.889%

Notes:

“Defective/Non-Defective % Change” reflects how the REPD score moved on those classes after the mutation relative to baseline on the same files.

What the results suggest about REPD sensitivity

Biggest signals: removing behavior and short-circuiting flow

Empty returns (+21.19% / +55.36%): Early-exiting methods (e.g., returning NULL/nullptr or returning prematurely) produces large, consistent structural damage - REPD strongly flags these.
Void Method Call removal (+11.54% / +18.08%): Eliminating calls that have side effects (e.g., cleanup/close, permission checks, tracepoints, synchronization) materially alters program semantics. REPD consistently treats this as high risk.
Forced return constants (True/Null/Primitive returns; Return Values): These increasingly “freeze” dynamic paths and error handling. The higher the probability of suppressing failures or misreporting state, the larger the REPD bump (up to +8.89% for primitive returns).

Moderate signals: blatant control-flow forcing

True/False returns (esp. True returns), and Null/Primitive return families: These mutations steer code into atypical paths or failure modes (e.g., reporting success, skipping checks, returning invalid pointers), which REPD catches as materially riskier than baseline.

Small/no signal: micro-control-flow and arithmetic tweaks

Conditionals boundary, increments, negate conditionals, negatives, math. REPD tends to treat them as low-risk noise, hence tiny score deltas.

Methodology (high level)

For each mutator, I created a PR in my OpenJ9 fork that applied representative mutations across 10 files.
I then ran the REPD approach against each PR to compute how its scoring changed on the affected classes compared to baseline.
I aggregated the deltas to the “Average % Change” values shown above.

Takeaway

REPD is most sensitive to mutations that remove behavior or force outcomes (empty/void-return changes, hardcoded returns).

Aug 27 '25 15:08 anirudhsengar