JPlag
JPlag copied to clipboard
End To End Test Suite extended
@jplag/maintainer @jplag/studdev
Extension Of The End To End Module
The extension of the EndToEnd tests includes the following new features/Fixes:
- (Fix) Identification in json changed to SHA256 hash
- (feature) extension of the comparative values as discussed in the issue End to end testing - "comparative values"
- (Fix) made json more flexible to better store results per test function
- current json format looks like this :
[...]
"function_name" : "normalizationLevelTest",
"test_results" : [ {
"result_similarity" : 100.0,
"result_minimal_similarity" : 100.0,
"result_maximum_similarity" : 100.0,
"result_matched_token_number" : 56,
"test_identifier" : "85FF00F531A497F002D40E9C8430CB159EFC40E618925FD757E35EB2A533227E"
},
[...]
- (feature) Made adjustments to the use of paths in a mapper to be able to add languages flexibly
LanguageToPathMapper.java
- (feature) Create the memory tests to insert the temporary results into the memory json file. This is necessary to be able to save changes made to languages automatically so that the endToEnd tests run successfully again after a change made to the recognition.
- (feature) Documentation written to the README.md. Here is a step by step guide how to create new endToEnd tests for languages
- (feature) The endToEnd test for java extended
- (feature) Made it possible to create a quick over all test with the help of
JPlagTestSuiteHelper.java
[...]
@Test
void overAllTests() throws IOException, ExitException, NoSuchAlgorithmException {
String[] testClassNames = jplagTestSuiteHelper.getAllTestFileNames();
runJPlagTestSuite(testClassNames);
}
[...]
For more necessary information please have a look at the README.md file.
Regarding the buidl error I am already looking for the problem. Likewise, I have local problem to build the project, since the module of Scala throws reference error with me
There seems to be problem with comparing the SHA256 hashes created on my windows machine and the one created by the unix build.
Thats may be related to line endings (\r\n for windows \n for windows). How did you created the SHA? (In Java or Locally)
Meeting summary
- Make adjustment to identifiktaor and use the test data names instead of the SHA256 hash
- remove the "result_similarity" field in the json file as this is composed of the max and min similarity.
- Extend with the possibility to run other test configurations.
- View JUnit DynamicTest and its use in the OverAllTests cases.
- View JUnit DynamicTest and its use in the OverAllTests cases.
https://www.baeldung.com/junit5-dynamic-tests
Here is an example. In best case the dynamic tests would be generated nested, but I didn't find this feature with a quick search. But I think it would be fine to have one test factory per test suite which simply generated the tests. If one adds a test suite in the future, the overhead of adding that one method is acceptable low. With generated nested, I mean a test structure like this
- Test Suite 1
- Test A1 <-> A2
- Test A1 <-> A3
- Test A2 <-> A3
- Test Suite 2
- Test B1 <-> B2
- ...
Meeting summary
DONE:
- [X] Make adjustment to identifiktaor and use the test data names instead of the SHA256 hash
- [X] Remove the "result_similarity" field in the json file as this is composed of the max and min similarity.
TODO:
- [x] Extend with the possibility to run other test configurations.
- [ ] View JUnit DynamicTest and its use in the OverAllTests cases.
I found something here about the nested junit5 dynamic tests which were mentioned in the meeting. I'll have a look at the topic and then make changes if necessary.
The implemented code for the Dynamic tests could look something like this:
@TestFactory
Stream<DynamicTest> dynamicTestsWithIterator() {
//test cases calculated with (n over k)
var fileNames = jplagTestSuiteHelper.getAllTestFileNames();
ArrayList<String[]> testCases = new ArrayList<>();
int outerCounter = 1;
for (String fileName : fileNames) {
for (int counter = outerCounter; counter < fileNames.length; counter++) {
testCases.add(new String[] { fileName, fileNames[counter] });
}
outerCounter++;
}
return testCases.stream().map(dom -> DynamicTest.dynamicTest("Testing: " + dom[0] + " " + dom[1], () -> {
runJPlagTestSuite(dom);
jplagTestSuiteHelper.clear();
}));
}
- this means that the jplagTestSuiteHelper.clear() function must be executed on itself in the test case.
- Also the test time increases from about 10 seconds to over 30. If a language has even more test data, this could lead to a significantly increased test runtime.
To avoid having to write a separate TestExecutionListener, the test case is now handled with cry-finally
return testCases.stream().map(testCase -> DynamicTest.dynamicTest("Testing: " + testCase[0] + " " + testCase[1], () -> {
try {
runJPlagTestSuite(testCase, functionName);
} finally {
jplagTestSuiteHelper.clear();
}
}));
Also the test time increases from about 10 seconds to over 30. If a language has even more test data, this could lead to a significantly increased test runtime.
The end-to-end tests are only supposed to run for the Java language frontend. So I think this should be fine.
First of all, thank you for the detailed comments and remarks. Based on the comments I will have to come up with a new structure for the json data and the tests. I hope it is enough time for me to rebuild everything. If I have questions or comments I would simply create a new issue.
I tried to help you save some time by already providing an adjusted JSON scheme
[{ "options": {"minimum_match_length" : <value>}, "tests": { <identifier1> : {"min_similarity" : <value>, [...other values...] }, <identifier2> : {...}, [...] } }, {...}]
So the model would be something like (pseudo-code) this:
ResultDescription {
Options options;
Map<String, ExpectedResult> identifierToResultMap;
Options {
int minimumMatchLength;
}
ExpectedResult {
int minSimilarity;
int maxSimilarity;
int numberOfMatchedTokens;
}
}
Hope that helps.
Concept code only! has not been cleaned yet
After I rewrote the module, you get such a json file with the name sortAlgo.json. Here it is important to say that now no new language tests must be added manually! As soon as a folder with the language string in the resoucren folder is present and there test data are, these are taken over with into the test. It is important that the upper folder is named exactly like the language option.toString. Also from these new folders new tests are generated.
Click to show!
public static Map<LanguageOption, Map<String, Path>> getAllLanguageResources() {
String[] languageDirectoryNames = FileHelper
.getAllDirectoriesInPath(TestDirectoryConstants.BASE_PATH_TO_LANGUAGE_RESOURCES);
List<LanguageOption> languageInPathList = FileHelper.getLanguageOptionsFromPath(languageDirectoryNames);
var returnMap = new HashMap<LanguageOption, Map<String, Path>>();
for (LanguageOption languageOption : languageInPathList) {
var tempMap = new HashMap<String, Path>();
var allDirectoriesInPath = FileHelper.getAllDirectoriesInPath(Path
.of(TestDirectoryConstants.BASE_PATH_TO_LANGUAGE_RESOURCES.toString(), languageOption.toString()));
Arrays.asList(allDirectoriesInPath)
.forEach(directory -> tempMap.put(Path.of(directory).toFile().getName(),
Path.of(TestDirectoryConstants.BASE_PATH_TO_LANGUAGE_RESOURCES.toString(),
languageOption.toString(), directory)));
returnMap.put(languageOption, tempMap);
}
return returnMap;
}
result json for a test folder:
[ {
"options" : {
"minimum_token_match" : 1
},
"tests" : {
"SortAlgo1_2_SortAlgo1_3" : {
"result_minimal_similarity" : 82.89474,
"result_maximum_similarity" : 84.0,
"result_matched_token_number" : 63
},
"SortAlgo1_2_SortAlgo1_4" : {
"result_minimal_similarity" : 76.31579,
"result_maximum_similarity" : 98.305084,
"result_matched_token_number" : 58
},
"SortAlgo1_3_SortAlgo1_4" : {
"result_minimal_similarity" : 77.333336,
"result_maximum_similarity" : 98.305084,
"result_matched_token_number" : 58
},
"SortAlgo_SortAlgo1_2" : {
"result_minimal_similarity" : 73.68421,
"result_maximum_similarity" : 100.0,
"result_matched_token_number" : 56
},
"SortAlgo_SortAlgo1_4" : {
"result_minimal_similarity" : 94.91525,
"result_maximum_similarity" : 100.0,
"result_matched_token_number" : 56
},
"SortAlgo_SortAlgo1_3" : {
"result_minimal_similarity" : 74.666664,
"result_maximum_similarity" : 100.0,
"result_matched_token_number" : 56
}
}
}, {
"options" : {
"minimum_token_match" : 15
},
"tests" : {
"SortAlgo1_2_SortAlgo1_3" : {
"result_minimal_similarity" : 52.63158,
"result_maximum_similarity" : 53.333332,
"result_matched_token_number" : 40
},
"SortAlgo1_2_SortAlgo1_4" : {
"result_minimal_similarity" : 40.789474,
"result_maximum_similarity" : 52.542374,
"result_matched_token_number" : 31
},
"SortAlgo1_3_SortAlgo1_4" : {
"result_minimal_similarity" : 32.0,
"result_maximum_similarity" : 40.677967,
"result_matched_token_number" : 24
},
"SortAlgo_SortAlgo1_2" : {
"result_minimal_similarity" : 59.210526,
"result_maximum_similarity" : 80.35714,
"result_matched_token_number" : 45
},
"SortAlgo_SortAlgo1_4" : {
"result_minimal_similarity" : 52.542374,
"result_maximum_similarity" : 55.357143,
"result_matched_token_number" : 31
},
"SortAlgo_SortAlgo1_3" : {
"result_minimal_similarity" : 53.333332,
"result_maximum_similarity" : 71.42857,
"result_matched_token_number" : 40
}
}
} ]
DynamicTest
As already mentioned, there is now only one testFactory for all tests and languages with the permutation of test data and options. The displayed name for the test results from the language, the configuration and the tested data (JAVA: (15) SortAlgo_SortAlgo1)
@TestFactory
Collection<DynamicTest> dynamicOverAllTest() {
for (Entry<LanguageOption, Map<String, Path>> languageMap : LanguageToTestCaseMapper.entrySet()) {
LanguageOption currentLanguageOption = languageMap.getKey();
for (Entry<String, Path> languagePaths : languageMap.getValue().entrySet()) {
String[] fileNames = FileHelper.loadAllTestFileNames(languagePaths.getValue());
var testCases = JPlagTestSuiteHelper.getPermutation(fileNames, languagePaths.getValue());
var testCollection = new ArrayList<DynamicTest>();
for (Options option : options) {
for (var testCase : testCases) {
testCollection.add(DynamicTest
.dynamicTest(getTestCaseDisplayName(option, currentLanguageOption, testCase), () -> {
try {
runJPlagTestSuite(languagePaths.getValue().getFileName().toString(), option,
currentLanguageOption, testCase);
} finally {
JPlagTestSuiteHelper.clear();
}
}));
}
}
return testCollection;
}
}
return null;
}
Models
The models are as follows and are based on your examples.
public class ExpectedResult {
@JsonProperty("result_minimal_similarity")
private float resultSimilarityMinimum;
@JsonProperty("result_maximum_similarity")
private float resultSimilarityMaximum;
@JsonProperty("result_matched_token_number")
private int resultMatchedTokenNumber;
public ExpectedResult(float resultSimilarityMinimum, float resultSimilarityMaximum,
int resultMatchedTokenNumber) {
this.resultSimilarityMinimum = resultSimilarityMaximum;
this.resultSimilarityMaximum = resultSimilarityMaximum;
this.resultMatchedTokenNumber = resultMatchedTokenNumber;
}
public class Options {
@JsonProperty("minimum_token_match")
private int minimumTokenMatch;
public Options(int minimumTokenMatch)
{
this.minimumTokenMatch = minimumTokenMatch;
}
public Options(JPlagOptions jplagOptions)
{
this.minimumTokenMatch = jplagOptions.getMinimumTokenMatch();
}
public class ResultDescription {
@JsonIgnore
private LanguageOption languageOption;
@JsonProperty("options")
Options options;
@JsonProperty("tests")
Map<String, ExpectedResult> identifierToResultMap;
public ResultDescription(Options options, Map<String, ExpectedResult> identifierToResultMap)
{
this.options = options;
this.identifierToResultMap = identifierToResultMap;
}
public ResultDescription(Options options , JPlagComparison jPlagComparison , LanguageOption languageOption)
{
this.languageOption = languageOption;
this.options = options;
identifierToResultMap = new HashMap<>();
identifierToResultMap.put(JPlagTestSuiteHelper.getTestIdentifier(jPlagComparison), new ExpectedResult(jPlagComparison));
}
Summary
I'm not quite done with the changes yet but this could possibly be enough for the first feedbacks. to make the test suite runnable again I need some more time. The most important changes are:
- Revision of the json and models
- only one dynamic test for all languages and for all test data
- to add new language endToEnd tests you only have to add the test data in
test/resources/LANGUAGE/...
-> important is that the top folder is named like the enum type LanguageOption you want to test
There is only completely uncleaned code in this branch! it is only meant to be looked at, as I haven't had time to clean it up yet.
Overall looks good to me :) Some hints:
- The code of
dynamicOverAllTest
returns after having collected tests for one language instead of for all. However, as Timur said earlier, it is totally sufficient if the tests only support Java as of now. So if there are problems adjusting it, you could also simply remove the options for supporting dynamic languages. - I would trimm the "result_" prefix from the JSON values for brevity
- All model classes should be records
- I don't get the necessity for the 2nd constructor in both the
Options
andResultDescription
classes. From my opinion they should only be used for deserialising the JSON, why do I need to initialize them with the other values? Especially for theResultDescription
I think adding the language option and allowing initialization with aJPlagComparison
object mixes concepts here.
Thanks for the quick feedback! I'm not sure about the dynamic language loading removal. Currently it only supports java, but if you want to test more than one language endToEnd, I think it's handy to only have to create the testdata. Timur what do you think about this?
The end-to-end tests are only supposed to run for the Java language frontend. So I think this should be fine.
If you can make it work for various languages without too much effort, it's a nice to have but don't spend too much time on it.
It is already implemented and would be more effort to remove it again than to take it further.
@SuyDesignz can you resolve all review comments that you addressed? If you are waiting for feedback in a comment just tag the reviewer again.
I tried to change the classes of the json objects to records. Here I ran into some problems with the deserialization. Apparently there are already some entries about this : Java 14/15 records can not set final field #1794 Properties naming strategy do not work with Record #2992 @JsonNaming does not work with Java 16 record types #3102
my conclusion is that I will not change the classes in to records. I haven't found a working WorkAround yet. Class is attached.
Click to show!
@JsonDeserialize(as = ResultDescription.class)
public record ResultDescription(@JsonIgnore LanguageOption languageOption,
@JsonProperty("options") Options options,
@JsonProperty("tests") Map<String, ExpectedResult> identifierToResultMap,
@JsonIgnore JPlagComparison jplagComparison) {
public ResultDescription(LanguageOption languageOption, Options options, Map<String, ExpectedResult> identifierToResultMap, JPlagComparison jplagComparison){
this.identifierToResultMap = identifierToResultMap;
this.options = options;
this.languageOption = languageOption;
this.jplagComparison = jplagComparison;
}
public ResultDescription(Options options, Map<String, ExpectedResult> identifierToResultMap)
{
this(null , options , identifierToResultMap , null);
}
@JsonIgnore
public Map<String, ExpectedResult> getIdentifierResultMap()
{
return identifierToResultMap;
}
@JsonIgnore
public ExpectedResult getExpectedResultByIdentifier(String identifier)
{
return identifierToResultMap.get(identifier);
}
@JsonIgnore
public Options getOptions()
{
return options;
}
@JsonIgnore
public LanguageOption getLanguageOption()
{
return languageOption;
}
public void putIdenfifierToResultMap(String identifier, ExpectedResult expectedResult) {
identifierToResultMap.put(identifier, expectedResult);
}
@JsonDeserialize(as = Options.class)
public record Options(@JsonProperty("minimum_token_match") int minimumTokenMatch)
{
public Options(int minimumTokenMatch)
{
this.minimumTokenMatch = minimumTokenMatch;
}
@JsonIgnore
public int getMinimumTokenMatch() {
return minimumTokenMatch;
}
@JsonIgnore
@Override
public boolean equals(Object options) {
if (options instanceof Options) {
return minimumTokenMatch == ((Options) options).getMinimumTokenMatch();
} else {
return false;
}
}
@JsonIgnore
@Override
public String toString() {
return "Options [minimumTokenMatch=" + minimumTokenMatch+ "]";
}
}
@JsonDeserialize(as = ExpectedResult.class)
public record ExpectedResult(@JsonProperty("minimal_similarity")float resultSimilarityMinimum,
@JsonProperty("maximum_similarity") float resultSimilarityMaximum,
@JsonProperty("matched_token_number") int resultMatchedTokenNumber,
@JsonIgnore JPlagComparison jplagComparison)
{
public ExpectedResult(float resultSimilarityMinimum, float resultSimilarityMaximum,
int resultMatchedTokenNumber , JPlagComparison jplagComparison)
{
this.resultSimilarityMinimum = resultSimilarityMaximum;
this.resultSimilarityMaximum = resultSimilarityMaximum;
this.resultMatchedTokenNumber = resultMatchedTokenNumber;
this.jplagComparison = jplagComparison;
}
public ExpectedResult(JPlagComparison jplagComparison)
{
this(jplagComparison.minimalSimilarity() , jplagComparison.maximalSimilarity() , jplagComparison.getNumberOfMatchedTokens());
}
public ExpectedResult(float resultSimilarityMinimum, float resultSimilarityMaximum,
int resultMatchedTokenNumber) {
this(resultSimilarityMinimum, resultSimilarityMaximum, resultMatchedTokenNumber, null);
}
@JsonIgnore
public float getResultSimilarityMinimum() {
return resultSimilarityMinimum;
}
@JsonIgnore
public float getResultSimilarityMaximum() {
return resultSimilarityMaximum;
}
@JsonIgnore
public int getResultMatchedTokenNumber() {
return resultMatchedTokenNumber;
}
}
I tried to change the classes of the json objects to records. Here I ran into some problems with the deserialization. Apparently there are already some entries about this:
@SuyDesignz AFAIK this should be possible as we do it for the report generation (e.g. here). I just briefly glanced at the issues you linked, and there are workarounds mentioned in there by some commenters. So either check the report generation or have a look at the workarounds.
If I change the models into records i receive this error-message while deserializing:
com.fasterxml.jackson.databind.JsonMappingException: Can not set final java.lang.int field de.jplag.end_to_end_testing.modelRecord.ResultDescription$Options.minimumTokenMatch to java.lang.Integer (through reference chain: java.lang.Object[][0]->de.jplag.end_to_end_testing.modelRecord.ResultDescription["options"])
with this record:
Click to show!
@JsonDeserialize(as = Options.class)
public record Options(@JsonProperty("minimum_token_match") int minimumTokenMatch) {
public Options(int minimumTokenMatch) {
this.minimumTokenMatch = minimumTokenMatch;
}
@JsonIgnore
public int getMinimumTokenMatch() {
return minimumTokenMatch;
}
@JsonIgnore
@Override
public boolean equals(Object options) {
if (options instanceof Options) {
return minimumTokenMatch == ((Options) options).getMinimumTokenMatch();
} else {
return false;
}
}
@JsonIgnore
@Override
public String toString() {
return "Options [minimumTokenMatch=" + minimumTokenMatch + "]";
}
}
when I change the type to integer I get the same error message
com.fasterxml.jackson.databind.JsonMappingException: Can not set final java.lang.Integer field de.jplag.end_to_end_testing.modelRecord.ResultDescription$Options.minimumTokenMatch to java.lang.Integer (through reference chain: java.lang.Object[][0]->de.jplag.end_to_end_testing.modelRecord.ResultDescription["options"])
If I change the models into records i receive this error-message while deserializing:
com.fasterxml.jackson.databind.JsonMappingException: Can not set final java.lang.int field de.jplag.end_to_end_testing.modelRecord.ResultDescription$Options.minimumTokenMatch to java.lang.Integer (through reference chain: java.lang.Object[][0]->de.jplag.end_to_end_testing.modelRecord.ResultDescription["options"])
with this record:
Click to show!
@JsonDeserialize(as = Options.class) public record Options(@JsonProperty("minimum_token_match") int minimumTokenMatch) { public Options(int minimumTokenMatch) { this.minimumTokenMatch = minimumTokenMatch; } @JsonIgnore public int getMinimumTokenMatch() { return minimumTokenMatch; } @JsonIgnore @Override public boolean equals(Object options) { if (options instanceof Options) { return minimumTokenMatch == ((Options) options).getMinimumTokenMatch(); } else { return false; } } @JsonIgnore @Override public String toString() { return "Options [minimumTokenMatch=" + minimumTokenMatch + "]"; } }
when I change the type to integer I get the same error message
com.fasterxml.jackson.databind.JsonMappingException: Can not set final java.lang.Integer field de.jplag.end_to_end_testing.modelRecord.ResultDescription$Options.minimumTokenMatch to java.lang.Integer (through reference chain: java.lang.Object[][0]->de.jplag.end_to_end_testing.modelRecord.ResultDescription["options"])
Try to delete the constructor, and all methods (as records provide them by default), and also delete @JsonDeserialize(as = Options.class)
Removing the constructor worked for me!
Click to show!
public record ResultDescription(@JsonIgnore LanguageOption languageOption, @JsonProperty("options") Options options,
@JsonProperty("tests") Map<String, ExpectedResult> identifierToResultMap) {
@JsonIgnore
public ExpectedResult getExpectedResultByIdentifier(String identifier) {
return identifierToResultMap.get(identifier);
}
public void putIdenfifierToResultMap(String identifier, ExpectedResult expectedResult) {
identifierToResultMap.put(identifier, expectedResult);
}
}
public record Options(@JsonProperty("minimum_token_match") Integer minimumTokenMatch) {
/**
* Compares like inside values with the passed object. is necessary to find the correct results in the deserialized json
* file.
*/
@JsonIgnore
@Override
public boolean equals(Object options) {
if (options instanceof Options optionsCaseted) {
return minimumTokenMatch == optionsCaseted.minimumTokenMatch();
} else {
return false;
}
}
/**
* Creates the hashCode for the AKtuelle Options object
*/
@JsonIgnore
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + minimumTokenMatch;
return result;
}
@JsonIgnore
@Override
public String toString() {
return "Options [minimumTokenMatch=" + minimumTokenMatch + "]";
}
}
public record ExpectedResult(@JsonProperty("minimal_similarity") float resultSimilarityMinimum,
@JsonProperty("maximum_similarity") float resultSimilarityMaximum, @JsonProperty("matched_token_number") int resultMatchedTokenNumber) {
}
TODO:
- [x] add comments and documentation for public classes/records and functions
- [x] create the new README file for the new sequence of the endToEnd test suite
- [x] remove as many code smells as possible
- [x] review comments and documentation
Revised version is now available. Changes can be taken from the README.md as well as the documentation how to create new endToEnd tests for each language. the following has been changed:
- json format
"options" : {
"minimum_token_match" : 1
},
"tests" : {
"SortAlgo-SortAlgo5" : {
"minimal_similarity" : 82.14286,
"maximum_similarity" : 82.14286,
"matched_token_number" : 46
},
- Creating the tests runs completely dynamically, which means that even for new languages nothing has to be changed in the source code. Only test data must be made available. Can be found in the README under "Copying Plagiarism To The Resources".
- json serialization and deserialization now runs via records
- it is now possible to run the tests in different JPlag test settings. JPlag - option variants for the endToEnd tests #590
- The comparisons in the tests themselves have been adjusted to provide more accurate information about the incorrect values.
There were <3> validation error(s):
minimalSimilarity was 82.1 but expected 82.14286
maximalSimilarity was 82.1 but expected 82.14286
numberOfMatchedTokens was 41 but expected 46
==> expected: <true> but was: <false>
For all further information, the README should be a good source of information. Here everything concerning the basics of the TestSuite should be available and documented.
@SuyDesignz when I run mvn clean package assembly:single
locally with your branch, it produces junk files in the jplag/jplag
directory. This is probably unintended, right?
This should not actually happen, but has also only come with the current master branch. I still do not know what the reason is.
if you look in the A-C.json there are also completely different names as I use for the tests
{"file1":"GSTiling.java","file2":"GSTiling.java","start1":6,"end1":247,