pdfbox
pdfbox copied to clipboard
Update PDFStreamEngine.java
No need to allocate a new ArrayList here, reduce text extraction time from 16 seconds to 14 seconds on a 4.2M pdf.
This is a read only mirror. Please close this and open an issue in JIRA. https://issues.apache.org/jira/browse/PDFBOX
Of course every speed increase is welcome, but this change is one to be discussed with "the rest of the gang" - what is if one of the processOperator
methods keeps the argument list? If not now, maybe at a later time? Your change would pull it under the feet.
@THausherr What do you mean by keep
the argument list ? I assume you mean someone want to keep the elements in arguments inside processOperator
, well, in that case, the clear
method only remove elements out of arguments
, not destroy them, so if some one keeps reference of the elements, it will still works.
Any progress on this? The users of the passed array must make a copy of the arguments array.
No progress, this is a read only mirror. I told to create an issue in JIRA. I won't create it myself because I'm not persuaded by this. If "The users of the passed array must make a copy of the arguments array." then where would be the speed gain?
I should have written: The users of the passed array, which have to keep a list of the arguments, must make a copy of the arguments array. However I agree, this kind of optimalization must be investigated further, so that there is no unexpected side-effects.
I've created https://github.com/apache/pdfbox/pull/38 which investigates whether the ArrayList is in use after the call to processor. First impression is that this is not the case, and that the optimalization is possible.