Fix flaky-test TestSpreadsheetExtractor#testRTL
Test failure Reproduction
mvn install -pl . -am -DskipTests -Dsign.skip
mvn -pl . edu.illinois:nondex-maven-plugin:2.1.1:nondex -Dtest=technology.tabula.TestSpreadsheetExtractor#testRTL
Non-Dex detected flakiness and got the error message. More precisely as shown below:
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.436 s <<< FAILURE! - in technology.tabula.TestSpreadsheetExtractor
[ERROR] testRTL(technology.tabula.TestSpreadsheetExtractor) Time elapsed: 0.434 s <<< FAILURE!
org.junit.ComparisonFailure: expected:<[اسمي سلطان]> but was:<[]>
at technology.tabula.TestSpreadsheetExtractor.testRTL(TestSpreadsheetExtractor.java:458)
Root cause and fix
The failed assert is in line 458 file TestSpreadsheetExtractor.
assertEquals("اسمي سلطان", table.getRows().get(1).get(1).getText());
The flaky-test is caused by the function findSpreadsheetsFromCells() in SpreadsheetExtractionAlgorithm.java line 183. Because of using hashset and hashmap, this function will sometime return the result in different order.
public static List<Rectangle> findSpreadsheetsFromCells(List<? extends Rectangle> cells) {
// via: http://stackoverflow.com/questions/13746284/merging-multiple-adjacent-rectangles-into-one-polygon
List<Rectangle> rectangles = new ArrayList<>();
Set<Point2D> pointSet = new HashSet<>();
Map<Point2D, Point2D> edgesH = new HashMap<>();
Map<Point2D, Point2D> edgesV = new HashMap<>();
This cause the flaky. To deal with this problem, I changed the hashset and hashmap to linkedhashset and linkedhashmap. The difference between [hashset,hashmap] and [linkedhashset,linkedhashmap] is that [linkedhashset,linkedhashmap] will return fixed order, but [hashset,hashmap] will return a random order. This ensure the function will be deterministic, which means it will return the result in fixed order.