traprange icon indicating copy to clipboard operation
traprange copied to clipboard

Can we get all PDF data into the String variable, instead of getting data page by page?

Open Phannd7 opened this issue 9 years ago • 1 comments

Hi a. Tho,

Currently, I'm using "get" method to get PDF data from specific page. I wonder that can we get all PDF data at once instead of getting data page by page like that? My code:

public static int rowNumberOfPDFFile(String pdfLink, int pagePDFNumber) throws IOException { PDFTableExtractor extractor = new PDFTableExtractor(); List<Table> tables = extractor.setSource(pdfLink).extract(); // get date from page 1 to String html. Page number starts from 0 String html = tables.get(pagePDFNumber).toHtml();

    html = html.substring(html.indexOf("border='1'>") + 11);
    int rowNumber = org.apache.commons.lang3.StringUtils.countMatches(html, "/tr");
    return rowNumber;
}

I would like to get all PDF data into "html" field. Could you please help?

Thanks, Phan Nguyen

Phannd7 avatar Oct 12 '16 01:10 Phannd7

Hi Phan Nguyen,

I think you can do it by getting the html content of tables in all pages then use html parser such as Jsoup to parse table content and put them all together. Or you can also loop through all table models which are result of PDFTableExtractor.extract().

Sorry for my late reply.

Regards, Tho Q Luong

2016-10-12 9:19 GMT+08:00 Phannd7 [email protected]:

Hi a. Tho,

Currently, I'm using "get" method to get PDF data from specific page. I wonder that can we get all PDF data at once instead of getting data page by page like that? My code:

public static int rowNumberOfPDFFile(String pdfLink, int pagePDFNumber) throws IOException { PDFTableExtractor extractor = new PDFTableExtractor(); List tables = extractor.setSource(pdfLink).extract(); // get date from page 1 to String html. Page number starts from 0 String html = tables.get(pagePDFNumber).toHtml();

html = html.substring(html.indexOf("border='1'>") + 11);
int rowNumber = org.apache.commons.lang3.StringUtils.countMatches(html, "/tr");
return rowNumber;

}

I would like to get all PDF data into "html" field. Could you please help?

Thanks, Phan Nguyen

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/thoqbk/traprange/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/ABbAn2ZzaPOdx0HXzydDbJO0nisZvldnks5qzDW2gaJpZM4KURI4 .

thoqbk avatar Oct 14 '16 14:10 thoqbk