When word touch the cell, it can't be extracted
I use tabula in my project, and i found phenomenon, it happend in using SpreadsheetExtractionAlgorithm to extract the words in pdf document. When words touch the cell, it can't be extracted, and i compare the file named school , it has the same problem. so i fix them.
Hope tabula will get better!
Hi, we are researchers in software engineering, working on a novel GitHub bot for helping understand build failures.
We have run our prototype bot flacocobot on this failing PR and below are the results.
What do you think?
Thanks a lot! --Davide (@dginelli), André (@andre15silva), Matias (@martinezmatias), Benjamin (@danglotb), Martin (@monperrus)
The line (27) of the file technology/tabula/UtilsForTesting has been identified with a suspiciousness value of 75.59%.
Failing tests that cover this line
technology.tabula.TestBasicExtractor#testExtractColumnsCorrectlytechnology.tabula.TestBasicExtractor#testEmptyRegiontechnology.tabula.TestSpreadsheetExtractor#testNaturalOrderOfRectanglesDoesNotBreakContracttechnology.tabula.TestBasicExtractor#testColumnRecognitiontechnology.tabula.TestSpreadsheetExtractor#testAnotherExtractTableWithExternallyDefinedRulingstechnology.tabula.TestBasicExtractor#testCheckSqueezeDoesntBreaktechnology.tabula.TestSpreadsheetExtractor#testDontStackOverflowQuicksorttechnology.tabula.TestSpreadsheetExtractor#testSpreadsheetExtractionIssue656technology.tabula.TestSpreadsheetExtractor#testIncompleteGridtechnology.tabula.TestSpreadsheetExtractor#testRTLtechnology.tabula.TestBasicExtractor#testExtractColumnsCorrectly2technology.tabula.TestBasicExtractor#testExtractColumnsCorrectly3technology.tabula.TestWriters#testTSVWritertechnology.tabula.TestSpreadsheetExtractor#testSpreadsheetsSortedByTopAndRighttechnology.tabula.TestSpreadsheetExtractor#testRealLifeRTLtechnology.tabula.TestSpreadsheetExtractor#testShouldDetectASingleSpreadsheettechnology.tabula.TestBasicExtractor#testNaturalOrderOfRectanglestechnology.tabula.TestSpreadsheetExtractor#testExtractSpreadsheetWithinAnAreatechnology.tabula.TestBasicExtractor#testVerticalRulingsPreventMergingOfColumnstechnology.tabula.TestSpreadsheetExtractor#testDontRaiseSortExceptiontechnology.tabula.TestSpreadsheetExtractor#testExtractColumnsCorrectly3technology.tabula.TestBasicExtractor#testRemoveSequentialSpacestechnology.tabula.TestSpreadsheetExtractor#testMergeLinesCloseToEachOthertechnology.tabula.TestSpreadsheetExtractor#testSpanningCellstechnology.tabula.TestSpreadsheetExtractor#testExtractTableWithExternallyDefinedRulingstechnology.tabula.TestSpreadsheetExtractor#testSpanningCellsToCsvtechnology.tabula.TestSpreadsheetExtractor#testSpreadsheetExtractiontechnology.tabula.TestSpreadsheetExtractor#testSpreadsheetWithNoBoundingFrameShouldBeSpreadsheet
The line (20) of the file technology/tabula/UtilsForTesting has been identified with a suspiciousness value of 55.33%.
Failing tests that cover this line
technology.tabula.TestBasicExtractor#testExtractColumnsCorrectlytechnology.tabula.TestBasicExtractor#testEmptyRegiontechnology.tabula.TestBasicExtractor#testColumnRecognitiontechnology.tabula.TestBasicExtractor#testCheckSqueezeDoesntBreaktechnology.tabula.TestSpreadsheetExtractor#testSpreadsheetExtractionIssue656technology.tabula.TestBasicExtractor#testExtractColumnsCorrectly3technology.tabula.TestWriters#testTSVWritertechnology.tabula.TestSpreadsheetExtractor#testShouldDetectASingleSpreadsheettechnology.tabula.TestSpreadsheetExtractor#testExtractSpreadsheetWithinAnAreatechnology.tabula.TestBasicExtractor#testVerticalRulingsPreventMergingOfColumnstechnology.tabula.TestSpreadsheetExtractor#testDontRaiseSortExceptiontechnology.tabula.TestSpreadsheetExtractor#testExtractColumnsCorrectly3technology.tabula.TestBasicExtractor#testRemoveSequentialSpacestechnology.tabula.TestSpreadsheetExtractor#testSpreadsheetExtractiontechnology.tabula.TestSpreadsheetExtractor#testSpreadsheetWithNoBoundingFrameShouldBeSpreadsheet
The line (80) of the file technology/tabula/UtilsForTesting has been identified with a suspiciousness value of 55.33%.
Failing tests that cover this line
technology.tabula.TestWriters#testCSVSerializeInfinitytechnology.tabula.TestWriters#testCSVSerializeTwoTablestechnology.tabula.TestCommandLineApp#testExtractSpreadsheetWithAreatechnology.tabula.TestCommandLineApp#testExtractWithMultiplePercentAreatechnology.tabula.TestCommandLineApp#testExtractCSVWithAreatechnology.tabula.TestCommandLineApp#testExtractWithMultipleAbsoluteAreatechnology.tabula.TestWriters#testCSVWritertechnology.tabula.TestBasicExtractor#testRealLifeRTL2technology.tabula.TestCommandLineApp#testGuessOptiontechnology.tabula.TestWriters#testCSVMultilineRowtechnology.tabula.TestBasicExtractor#testTableWithMultilineHeadertechnology.tabula.TestCommandLineApp#testLatticeModeWithColumnOptiontechnology.tabula.TestCommandLineApp#testExtractWithPercentAndAbsoluteAreatechnology.tabula.TestCommandLineApp#testExtractSpreadsheetWithAreaAndNewFiletechnology.tabula.TestCommandLineApp#testExtractBatchSpreadsheetWithArea
The line (16) of the file technology/tabula/UtilsForTesting has been identified with a suspiciousness value of 42.86%.
Failing tests that cover this line
technology.tabula.TestBasicExtractor#testColumnRecognitiontechnology.tabula.TestBasicExtractor#testCheckSqueezeDoesntBreaktechnology.tabula.TestSpreadsheetExtractor#testSpreadsheetExtractionIssue656technology.tabula.TestBasicExtractor#testExtractColumnsCorrectly3technology.tabula.TestWriters#testTSVWritertechnology.tabula.TestBasicExtractor#testVerticalRulingsPreventMergingOfColumnstechnology.tabula.TestSpreadsheetExtractor#testExtractColumnsCorrectly3technology.tabula.TestBasicExtractor#testRemoveSequentialSpacestechnology.tabula.TestSpreadsheetExtractor#testSpreadsheetExtraction
The line (66) of the file technology/tabula/UtilsForTesting has been identified with a suspiciousness value of 31.94%.
Failing tests that cover this line
technology.tabula.TestWriters#testJSONSerializeTwoTablestechnology.tabula.TestWriters#testJSONSerializeInfinitytechnology.tabula.TestCommandLineApp#testExtractJSONWithAreatechnology.tabula.TestWriters#testJSONWritertechnology.tabula.TestCommandLineApp#testLatticeModeWithColumnAndMultipleAreasOption