PDFLayoutTextStripper
PDFLayoutTextStripper copied to clipboard
Tail characters getting stripped off
I am working with a host of PDF reports and while I am able to maintain the layout using your class, sometimes the tail characters are getting stripped off, but the parent class i.e. PDFTextStripper works fine.
Does this have anything to do with this.setCurrentPageWidth(pageRectangle.getWidth()); ??
By the way great work with the class, made the process of extracting tables so easy.
Hi! Thank you, great to hear that the class helps you! Can you send me (here or through my email) a PDF file which doesn't work?
` Independent Auditors Report
To members of Silverlake Axis Ltd.
rePort on the finAnciAl stAtements
We have audited the accompanying fnancial statements of Silverlake Axis Ltd. and its subsidiaries (collectively, the Group),
50 to 159, which comprise the statements of fnancial position of the Group and the Company as at 30 June 2016, the co
of changes in equity, consolidated income statement, consolidated statement of comprehensive income and consolidated statement
fows of the Group for the year then ended, and a summary of signifcant accounting policies and other explanatory informatio
Management’s Responsibility for the Financial Statements
Management is responsible for the preparation of fnancial statements that give a true and fair view in accordance with Int
Reporting Standards, and for devising and maintaining a system of internal accounting controls suffcient to provide a **reaso**
that assets are safeguarded against loss from unauthorised use or disposition; and transactions are properly authorised and
recorded as necessary to permit the preparation of true and fair consolidated income statement and statements of fnancial
maintain accountability of assets.
Auditors’ Responsibility
Our responsibility is to express an opinion on these fnancial statements based on our audit. We conducted our audit in ac
International Standards on Auditing. Those standards require that we comply with ethical requirements and plan and perform
obtain reasonable assurance about whether the consolidated fnancial statements are free from material misstatement.
An audit involves performing procedures to obtain audit evidence about the amounts and disclosures in the consolidated **fnan**
The procedures selected depend on the auditor’s judgement, including the assessment of risks of material misstatement of the
fnancial statements, whether due to fraud or error. In making those risk assessments, the auditor considers internal control
the entity’s preparation of the consolidated fnancial statements that give a true and fair view in order to design audit
appropriate in the circumstances, but not for the purpose of expressing an opinion on the effectiveness of the entity’s intern
audit also includes evaluating the appropriateness of accounting policies used and the reasonableness of accounting estimates
management, as well as evaluating the overall presentation of the consolidated fnancial statements.
We believe that the audit evidence we have obtained is suffcient and appropriate to provide a basis for our audit opinion.
Opinion
In our opinion, the consolidated fnancial statements of the Group and the statement of fnancial position of the Company a
up in accordance with the International Financial Reporting Standards so as to give a true and fair view of the **fnancial**
and of the Company as at 30 June 2016 and the results, changes in equity and cash **fows** of the Group for the year ended
other mAtters
This report is made solely to the members of the Company, as a body, and for no other purpose. We do not assume **responsib**
person for the content of this report.
eRNSt & YouNG
AF: 0039
Chartered Accountants
Kuala Lumpur, Malaysia
28 September 2016 `
This is what the extracted text looks like, if you look closely few of the characters are missing from words that I have highlighted and have also highlighted the issue where the tail characters are getting stripped.
I have attached the file as well, and the page number is 51 for the above extract. Thanks
Thanks, I am going to investigate on that this week-end
Thanks a lot, I was wondering if you could explain why were certain characters getting stripped!
You were right, it has to to with this.setCurrentPageWidth(pageRectangle.getWidth()); I'll make an update but meanwhile you can change that line to: this.setCurrentPageWidth(pageRectangle.getWidth() * 1.2); I also noticed that the space between the columns were sometimes not big enough (for instance with page 6). I'll try to fix that too.
@JonathanLink: Thanks for this class, very useful.
About Your last commit (88bfd8c): I see it's still in the 'dev' branch and hasn't been merged to 'master'. Is there any reason for that?
Also, do we have any way to set the page width externally (i.e.: call pdflayouttextstripper.setPageWidth() or something like that)? That would be very useful to decide case-by-case how to behave...
Thanks, MZ