pdfboxing
pdfboxing copied to clipboard
Issues with `merge-pddocuments`
Opening a new issue after discussion in #26
(pdf-split/merge-pddocuments
:docs (pdf-split/split-pdf :input path :start 1 :end 4)
:output "test.pdf")
Unhandled java.io.IOException
COSStream has been closed and cannot be read. Perhaps its enclosing
PDDocument has been closed?
Please check the mentioned issue for further details
bump
I'm also experiencing the same issue using pdfboxing
version 0.1.14.1-SNAPSHOT
. Here's a more distilled example using .save
instead of split/merge-documents
:
(-> (pdf/split-pdf :input "/tmp/example.pdf")
first
(.save "/tmp/page-1.pdf"))
Execution error (IOException) at org.apache.pdfbox.cos.COSStream/checkClosed (COSStream.java:82).
COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
@alanmarazzi I know it's been a year, but were you able to resolve the issue? @aaka3207 any luck on your side?
Nope unfortunately. I ended up using something else
Il Sab 6 Lug 2019, 14:38 Joshua Heimbach [email protected] ha scritto:
I'm also experiencing the same issue using pdfboxing version 0.1.14.1-SNAPSHOT. Here's a more distilled example using .save instead of split/merge-documents:
(-> (pdf/split-pdf :input "/tmp/example.pdf") first (.save "/tmp/page-1.pdf")) Execution error (IOException) at org.apache.pdfbox.cos.COSStream/checkClosed (COSStream.java:82). COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
@alanmarazzi https://github.com/alanmarazzi I know it's been a year, but were you able to resolve the issue? @aaka3207 https://github.com/aaka3207 any luck on your side?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotemacs/pdfboxing/issues/46?email_source=notifications&email_token=ADMDEQR6LKIZEF3C4ZR7XZ3P6CG27A5CNFSM4FNPPDI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZKY45Y#issuecomment-508923511, or mute the thread https://github.com/notifications/unsubscribe-auth/ADMDEQTTMTXGDZBIEWIYTZLP6CG27ANCNFSM4FNPPDIQ .
Nope unfortunately. I ended up using something else
Cheers for the quick reply.
Thank you @alanmarazzi for raising this issue, @aaka3207 & @joshkh for the nudges.
Looking at the code, it seems that this regression was introduced with the version 0.1.11
, in particular this line:
https://github.com/dotemacs/pdfboxing/commit/2ed58248360788cbdd4a4b5dbc95eadaddd9c767#diff-4c66c23479d6f9f33283c8cf88c003d0R49
That change with with-open
, does fix the issue of closing the file being split.
The problem that it introduced, as you've seen it already, is that those pages, which are instances of PDDocuments
, can't then be used in another function as input, as the file they are in has already been closed.
The thing is, I'm not sure when somebody would use split-pdf
just to view the PDDocuments
. So maybe it's OK to not use with-open
and not close the file being split in split-pdf
, if the result of the split is to be passed into another function which will then save the desired pages, just like @joshkh was trying to do with:
(-> (pdf/split-pdf :input "/tmp/example.pdf")
first
(.save "/tmp/page-1.pdf"))
I'd like to get your thoughts on this issue before I apply a fix, so please let me know.
Thanks
The change that fixes this issue is here: https://github.com/dotemacs/pdfboxing/commit/71696b2ae9a91196041538deae689533d32d6e7d
Thanks
@dotemacs Thanks for looking into this!
The thing is, I'm not sure when somebody would use split-pdf just to view the PDDocuments
I had the exact same thought. If users really want a collection of closed files then perhaps split-pdf
could accept an option to close them, or the other way around -- an option to keep them open. Either certainly works for me!
Thanks for the great library.
The thing is, I'm not sure when somebody would use split-pdf just to view the PDDocuments
I had the exact same thought. If users really want a collection of closed files then perhaps
split-pdf
could accept an option to close them, or the other way around -- an option to keep them open. Either certainly works for me!
I was thinking of having an option for specifying that the file should be closed...
But maybe just leave it as is and if somebody really needs it they can always open an issue or submit a PR?
Thanks for the great library.
Thanks for the kind words :)
I'm in favour of leaving it as is (after your patch). Nothing is stopping a user from closing the files on their own accord. :)
Any chance of a release to Clojars?
I'm in favour of leaving it as is (after your patch). Nothing is stopping a user from closing the files on their own accord. :)
True, thanks for making this easy for me :)
But maybe others have some other ideas/suggestions?
Any chance of a release to Clojars?
Since the project has been languishing a bit, I should do it. But not tonight. I'll do it tomorrow for sure, as there are some other issues to address also, such as the new pdfbox
releases...
Thanks :)
I've released a new version 0.1.14
on clojars with the fix for the above issue, as discussed yesterday.
Thank you @alanmarazzi, @aaka3207 & @joshkh for your patience and feedback.
Hi @dotemacs
Are we sure this was fixed? I'm using the latest version from Clojars 0.1.14
but I'm having the same problem:
(-> (pdf/split-pdf :input "/tmp/my.pdf")
first
(.save "/tmp/page-1.pdf"))
Execution error (IOException) at org.apache.pdfbox.cos.COSStream/checkClosed (COSStream.java:83).
COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
Hi @dotemacs
Are we sure this was fixed? I'm using the latest version from Clojars
0.1.14
but I'm having the same problem:(-> (pdf/split-pdf :input "/tmp/my.pdf") first (.save "/tmp/page-1.pdf")) Execution error (IOException) at org.apache.pdfbox.cos.COSStream/checkClosed (COSStream.java:83). COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
I thought that it was, even with this test in place it still passes:
https://github.com/dotemacs/pdfboxing/commit/71696b2ae9a91196041538deae689533d32d6e7d#diff-fb3a651e247bd56798ada20f002042ccR38
Let me revisit this again. If for some reason you don't hear back from me, please shout.
Sorry about this
No worries! I saw the fix-for-split-pdf
branch, tested it, and sure enough it works. Maybe the release to Clojars was done with the master branch?
I've made a fork here with: https://github.com/joshkh/pdfboxing
- added deps.edn for wider tooling support
- merged your
fix-for-split-pdf
intomaster
And when I include the latest SHA in my project it works as expected. :)
{:deps {pdfboxing {:sha "ec1d5b06dafeb9886f2a55a1754632cfd177b824"
:git/url "[email protected]:joshkh/pdfboxing.git"}}}
(-> (pdf/split-pdf :input "/tmp/my.pdf")
first
(.save "/tmp/page-1.pdf"))
=> nil
It looks like that might be the issue...
Thanks for looking into this and sorry about this slip up.
Feeling a bit drained, will check this first thing in the morning.
Hello @joshkh @alanmarazzi @aaka3207
I've merged https://github.com/dotemacs/pdfboxing/pull/73 which should resolve this issue.
Sorry for the delay on this and thanks for your feedback.
Please let me know if you still experience the regression.
Thanks