tabula-java icon indicating copy to clipboard operation
tabula-java copied to clipboard

Last rows of table content not extracted.

Open micklegill opened this issue 7 years ago • 2 comments

While extracting table using lattice extraction last rows of table are not detected. I am posting my pdf file along with command used. Command Used : java -jar tabula-1.0.1-jar-with-dependencies.jar -l -p 2 Tables.pdf -o t.csv Tables.pdf

micklegill avatar Apr 25 '18 06:04 micklegill

Getting same issue. Is it getting resolved any sooner?

sandeepsharma-kgp avatar May 05 '20 20:05 sandeepsharma-kgp

I have the same issue. In code extending the Rectangle's height slightly seems to "fix" the issue (I'm using BottomMargin=2):

NurminenDetectionAlgorithm nda = new NurminenDetectionAlgorithm();
SpreadsheetExtractionAlgorithm sea = new SpreadsheetExtractionAlgorithm();
PageIterator pages = extractor.extract();
List<Table> tables = new ArrayList<Table>();
    while (pages.hasNext()) {
        Page page = pages.next(); 
        List<Rectangle> areas = nda.detect(page);
        for (Rectangle a : areas) {
            a.setBottom((a.getBottom()+BottomMargin)); // FIXME: Extend Rectangle by 2pt down to read last row 
            Page sub_page = page.getArea(a);
            tables.addAll(sea.extract(sub_page))

hs-neax avatar Nov 20 '20 12:11 hs-neax