tabula-java icon indicating copy to clipboard operation
tabula-java copied to clipboard

When word touch the cell, it can't be extracted

Open Zinner0304 opened this issue 4 years ago • 1 comments

I use tabula in my project, and i found phenomenon, it happend in using SpreadsheetExtractionAlgorithm to extract the words in pdf document. When words touch the cell, it can't be extracted, and i compare the file named school , it has the same problem. so i fix them. Hope tabula will get better!

Zinner0304 avatar Dec 05 '21 08:12 Zinner0304

Hi, we are researchers in software engineering, working on a novel GitHub bot for helping understand build failures.

We have run our prototype bot flacocobot on this failing PR and below are the results.

What do you think?

Thanks a lot! --Davide (@dginelli), André (@andre15silva), Matias (@martinezmatias), Benjamin (@danglotb), Martin (@monperrus)


The line (27) of the file technology/tabula/UtilsForTesting has been identified with a suspiciousness value of 75.59%.

Failing tests that cover this line
  • technology.tabula.TestBasicExtractor#testExtractColumnsCorrectly
  • technology.tabula.TestBasicExtractor#testEmptyRegion
  • technology.tabula.TestSpreadsheetExtractor#testNaturalOrderOfRectanglesDoesNotBreakContract
  • technology.tabula.TestBasicExtractor#testColumnRecognition
  • technology.tabula.TestSpreadsheetExtractor#testAnotherExtractTableWithExternallyDefinedRulings
  • technology.tabula.TestBasicExtractor#testCheckSqueezeDoesntBreak
  • technology.tabula.TestSpreadsheetExtractor#testDontStackOverflowQuicksort
  • technology.tabula.TestSpreadsheetExtractor#testSpreadsheetExtractionIssue656
  • technology.tabula.TestSpreadsheetExtractor#testIncompleteGrid
  • technology.tabula.TestSpreadsheetExtractor#testRTL
  • technology.tabula.TestBasicExtractor#testExtractColumnsCorrectly2
  • technology.tabula.TestBasicExtractor#testExtractColumnsCorrectly3
  • technology.tabula.TestWriters#testTSVWriter
  • technology.tabula.TestSpreadsheetExtractor#testSpreadsheetsSortedByTopAndRight
  • technology.tabula.TestSpreadsheetExtractor#testRealLifeRTL
  • technology.tabula.TestSpreadsheetExtractor#testShouldDetectASingleSpreadsheet
  • technology.tabula.TestBasicExtractor#testNaturalOrderOfRectangles
  • technology.tabula.TestSpreadsheetExtractor#testExtractSpreadsheetWithinAnArea
  • technology.tabula.TestBasicExtractor#testVerticalRulingsPreventMergingOfColumns
  • technology.tabula.TestSpreadsheetExtractor#testDontRaiseSortException
  • technology.tabula.TestSpreadsheetExtractor#testExtractColumnsCorrectly3
  • technology.tabula.TestBasicExtractor#testRemoveSequentialSpaces
  • technology.tabula.TestSpreadsheetExtractor#testMergeLinesCloseToEachOther
  • technology.tabula.TestSpreadsheetExtractor#testSpanningCells
  • technology.tabula.TestSpreadsheetExtractor#testExtractTableWithExternallyDefinedRulings
  • technology.tabula.TestSpreadsheetExtractor#testSpanningCellsToCsv
  • technology.tabula.TestSpreadsheetExtractor#testSpreadsheetExtraction
  • technology.tabula.TestSpreadsheetExtractor#testSpreadsheetWithNoBoundingFrameShouldBeSpreadsheet

The line (20) of the file technology/tabula/UtilsForTesting has been identified with a suspiciousness value of 55.33%.

Failing tests that cover this line
  • technology.tabula.TestBasicExtractor#testExtractColumnsCorrectly
  • technology.tabula.TestBasicExtractor#testEmptyRegion
  • technology.tabula.TestBasicExtractor#testColumnRecognition
  • technology.tabula.TestBasicExtractor#testCheckSqueezeDoesntBreak
  • technology.tabula.TestSpreadsheetExtractor#testSpreadsheetExtractionIssue656
  • technology.tabula.TestBasicExtractor#testExtractColumnsCorrectly3
  • technology.tabula.TestWriters#testTSVWriter
  • technology.tabula.TestSpreadsheetExtractor#testShouldDetectASingleSpreadsheet
  • technology.tabula.TestSpreadsheetExtractor#testExtractSpreadsheetWithinAnArea
  • technology.tabula.TestBasicExtractor#testVerticalRulingsPreventMergingOfColumns
  • technology.tabula.TestSpreadsheetExtractor#testDontRaiseSortException
  • technology.tabula.TestSpreadsheetExtractor#testExtractColumnsCorrectly3
  • technology.tabula.TestBasicExtractor#testRemoveSequentialSpaces
  • technology.tabula.TestSpreadsheetExtractor#testSpreadsheetExtraction
  • technology.tabula.TestSpreadsheetExtractor#testSpreadsheetWithNoBoundingFrameShouldBeSpreadsheet

The line (80) of the file technology/tabula/UtilsForTesting has been identified with a suspiciousness value of 55.33%.

Failing tests that cover this line
  • technology.tabula.TestWriters#testCSVSerializeInfinity
  • technology.tabula.TestWriters#testCSVSerializeTwoTables
  • technology.tabula.TestCommandLineApp#testExtractSpreadsheetWithArea
  • technology.tabula.TestCommandLineApp#testExtractWithMultiplePercentArea
  • technology.tabula.TestCommandLineApp#testExtractCSVWithArea
  • technology.tabula.TestCommandLineApp#testExtractWithMultipleAbsoluteArea
  • technology.tabula.TestWriters#testCSVWriter
  • technology.tabula.TestBasicExtractor#testRealLifeRTL2
  • technology.tabula.TestCommandLineApp#testGuessOption
  • technology.tabula.TestWriters#testCSVMultilineRow
  • technology.tabula.TestBasicExtractor#testTableWithMultilineHeader
  • technology.tabula.TestCommandLineApp#testLatticeModeWithColumnOption
  • technology.tabula.TestCommandLineApp#testExtractWithPercentAndAbsoluteArea
  • technology.tabula.TestCommandLineApp#testExtractSpreadsheetWithAreaAndNewFile
  • technology.tabula.TestCommandLineApp#testExtractBatchSpreadsheetWithArea

The line (16) of the file technology/tabula/UtilsForTesting has been identified with a suspiciousness value of 42.86%.

Failing tests that cover this line
  • technology.tabula.TestBasicExtractor#testColumnRecognition
  • technology.tabula.TestBasicExtractor#testCheckSqueezeDoesntBreak
  • technology.tabula.TestSpreadsheetExtractor#testSpreadsheetExtractionIssue656
  • technology.tabula.TestBasicExtractor#testExtractColumnsCorrectly3
  • technology.tabula.TestWriters#testTSVWriter
  • technology.tabula.TestBasicExtractor#testVerticalRulingsPreventMergingOfColumns
  • technology.tabula.TestSpreadsheetExtractor#testExtractColumnsCorrectly3
  • technology.tabula.TestBasicExtractor#testRemoveSequentialSpaces
  • technology.tabula.TestSpreadsheetExtractor#testSpreadsheetExtraction

The line (66) of the file technology/tabula/UtilsForTesting has been identified with a suspiciousness value of 31.94%.

Failing tests that cover this line
  • technology.tabula.TestWriters#testJSONSerializeTwoTables
  • technology.tabula.TestWriters#testJSONSerializeInfinity
  • technology.tabula.TestCommandLineApp#testExtractJSONWithArea
  • technology.tabula.TestWriters#testJSONWriter
  • technology.tabula.TestCommandLineApp#testLatticeModeWithColumnAndMultipleAreasOption

monperrus avatar Dec 09 '21 13:12 monperrus