Can we get all PDF data into the String variable, instead of getting data page by page?
Hi a. Tho,
Currently, I'm using "get" method to get PDF data from specific page. I wonder that can we get all PDF data at once instead of getting data page by page like that? My code:
public static int rowNumberOfPDFFile(String pdfLink, int pagePDFNumber) throws IOException { PDFTableExtractor extractor = new PDFTableExtractor(); List<Table> tables = extractor.setSource(pdfLink).extract(); // get date from page 1 to String html. Page number starts from 0 String html = tables.get(pagePDFNumber).toHtml();
html = html.substring(html.indexOf("border='1'>") + 11);
int rowNumber = org.apache.commons.lang3.StringUtils.countMatches(html, "/tr");
return rowNumber;
}
I would like to get all PDF data into "html" field. Could you please help?
Thanks, Phan Nguyen
Hi Phan Nguyen,
I think you can do it by getting the html content of tables in all pages then use html parser such as Jsoup to parse table content and put them all together. Or you can also loop through all table models which are result of PDFTableExtractor.extract().
Sorry for my late reply.
Regards, Tho Q Luong
2016-10-12 9:19 GMT+08:00 Phannd7 [email protected]:
Hi a. Tho,
Currently, I'm using "get" method to get PDF data from specific page. I wonder that can we get all PDF data at once instead of getting data page by page like that? My code:
public static int rowNumberOfPDFFile(String pdfLink, int pagePDFNumber) throws IOException { PDFTableExtractor extractor = new PDFTableExtractor(); List tables = extractor.setSource(pdfLink).extract(); // get date from page 1 to String html. Page number starts from 0 String html = tables.get(pagePDFNumber).toHtml();
html = html.substring(html.indexOf("border='1'>") + 11); int rowNumber = org.apache.commons.lang3.StringUtils.countMatches(html, "/tr"); return rowNumber;}
I would like to get all PDF data into "html" field. Could you please help?
Thanks, Phan Nguyen
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/thoqbk/traprange/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/ABbAn2ZzaPOdx0HXzydDbJO0nisZvldnks5qzDW2gaJpZM4KURI4 .