Regex in rm_broken_tests.pl causes incorrect method replacement for inherited test methods
Description
The rm_broken_tests.pl script in relies on regular expressions to identify closing curly braces and insert empty method stubs instead of finding and replacing declaration of the inherited test methods.
However, its current implementation contains two critical issues that cause failures in accurately locating closing curly braces, especially when:
- The line containing the closing brace has no trailing newline, and
- Multiple inherited test methods are present (multiple empty methods are required to inserted at the end of a same file).
The core logic attempts to locate the end of a class by matching the following regex: https://github.com/rjust/defects4j/blob/b710dec28986dfe618ef2d4696abecce00d119d9/framework/util/rm_broken_tests.pl#L301
This pattern assumes a non-} character ([^}]) follows the closing brace, which fails when the line ends with the brace (no newline or EOF). As a result, the pattern does not match and the script fails to identify the insertion point for the empty method stub.
Additionally, when multiple inherited methods exist, the script replaces the line at $index with code block containing multiple line breaks (for the first method) but continues searching from the same index to insert empty stub for the next inherited method. Since the buffer line now includes additional newlines ($1\n$override}$2), the pattern no longer matches, and empty method stubs of subsequent inherited methods are not inserted.
Known Affected Bugs
- Lang-9, Lang-10: org/apache/commons/lang3/time/FastDateFormat_ParserTest.java
- Mockito-12 to Mockito-17 and Mockito-25 to Mockito-38:
- org/mockito/verification/TimeoutTest.java
- org/mockitousage/verification/VerificationWithTimeoutTest.java
Proposed Fixes
-
Regex Patch: Update the regex pattern to
^(.*)}([^}]*)$to tolerate missing trailing characters -
Index Reset: Re-index the $buffer after each insertion to ensure closing curly braces can be detected for subsequent inherited methods.
Larger Recommendation
This script's heavy reliance on line-level regex introduces fragility, leading to other bugs such as:
- Insertion of incorrect comment content (e.g. Cli-1, org/apache/commons/cli/BugsTest.java)
- Inclusion of unintended code elements (e.g.
@Testannotations due to$spacenot always being whitespace) https://github.com/rjust/defects4j/blob/b710dec28986dfe618ef2d4696abecce00d119d9/framework/util/rm_broken_tests.pl#L225-L228
For robust and maintainable behavior, consider moving to a character-level or AST-based replacement strategy. If avoiding external dependencies is a concern, the javac compiler APIs (com.sun.source.util.*, com.sun.source.tree.*) offer an accurate and dependency-free way to parse and modify Java source files.
I have tested such an approach and found it significantly more reliable for precise method replacement.
@universetraveller are you willing to send a PR with a more robust test-replacement approach?
Hi @rjust , I can probably work on this during the winter recess.
This issue came up when I was reimplementing Defects4J in Python. Right now, the fix only exists in my python utility class, not in java. I do have a small java class that confirms the fix works, but it’s not a complete or robust Java implementation yet.
Another concern is that the current script is executed directly like a command. If it is rewritten in java, the java source can’t be called directly without compiling it first, and adding binary files like classes and jars to the repository doesn’t seem ideal.
Do you have approaches on how to integrate both languages in this repository? Maybe it could be added as a new task to framework/projects/defects4j.build.xml, but I’m not sure if that’s the best approach.