tabula-java icon indicating copy to clipboard operation
tabula-java copied to clipboard

Tabula App and tabula-java Command line Utility giving different Outputs

Open Kanz95 opened this issue 7 years ago • 7 comments

FT801167430603.pdf

I have attached a pdf , which gives me a different output while using tabula-java command line utility and the tabula app. Using the app, by clicking autodetect tables, I'm getting all the tables in all the pages. But using the command line utility , I am getting only the tables in the last two pages.

This is the command that i used: java -jar tabula-1.0.1-jar-with-dependencies.jar --guess -d -p all -t FT801167430603.pdf

Kanz95 avatar Mar 12 '18 10:03 Kanz95

Thanks, @Kanz95

Which version of the Tabula app are you using? The current one (1.2.0) only detects tables in the last two pages as well.

jazzido avatar Mar 12 '18 15:03 jazzido

Tabula_App_Output.zip Thanks @jazzido . I used the latest tabula 1.2.0 app (tabula-jar.zip version for linux) and i have attached the screenshot and the output csv format from the app. Its detecting the tables in all the 4 pages.

Kanz95 avatar Mar 19 '18 07:03 Kanz95

@jazzido Can you please look into this issue as soon as possible, cause we are building an app above it, that has a very close deadline.

Kanz95 avatar Mar 21 '18 08:03 Kanz95

Hi @Kanz95,

No.

I don't know when I'm going to be able to look at this (Tabula is a side project, I don't get paid for it). If you're building an app, you might want to devote some of your resources to fix the issue yourself.

jazzido avatar Mar 21 '18 14:03 jazzido

…and contribute back the fix, if you're so inclined.

jazzido avatar Mar 21 '18 14:03 jazzido

The bug is in the file: https://github.com/tabulapdf/tabula-java/blob/master/src/main/java/technology/tabula/detectors/NurminenDetectionAlgorithm.java line: 488

if (edgeCountsPerLine[i][TextEdge.LEFT] > 2 &&

it should be: if (edgeCountsPerLine[i][TextEdge.LEFT] > 1 &&

muoithang avatar May 18 '18 23:05 muoithang

The bug is in the file: https://github.com/tabulapdf/tabula-java/blob/master/src/main/java/technology/tabula/detectors/NurminenDetectionAlgorithm.java line: 488

if (edgeCountsPerLine[i][TextEdge.LEFT] > 2 &&

it should be: if (edgeCountsPerLine[i][TextEdge.LEFT] > 1 &&

I tried this and was not the answer, the result became worst.

lorenzomartinezdev avatar Apr 15 '19 07:04 lorenzomartinezdev