pdfboxing icon indicating copy to clipboard operation
pdfboxing copied to clipboard

Issues with `merge-pddocuments`

Open alanmarazzi opened this issue 6 years ago • 16 comments

Opening a new issue after discussion in #26

(pdf-split/merge-pddocuments
  :docs (pdf-split/split-pdf :input path :start 1 :end 4)
  :output "test.pdf")

Unhandled java.io.IOException
   COSStream has been closed and cannot be read. Perhaps its enclosing
   PDDocument has been closed?

Please check the mentioned issue for further details

alanmarazzi avatar Aug 02 '18 08:08 alanmarazzi

bump

aaka3207 avatar Oct 16 '18 23:10 aaka3207

I'm also experiencing the same issue using pdfboxing version 0.1.14.1-SNAPSHOT. Here's a more distilled example using .save instead of split/merge-documents:

(-> (pdf/split-pdf :input "/tmp/example.pdf")
  first 
  (.save "/tmp/page-1.pdf"))
Execution error (IOException) at org.apache.pdfbox.cos.COSStream/checkClosed (COSStream.java:82).
COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?

@alanmarazzi I know it's been a year, but were you able to resolve the issue? @aaka3207 any luck on your side?

joshkh avatar Jul 06 '19 12:07 joshkh

Nope unfortunately. I ended up using something else

Il Sab 6 Lug 2019, 14:38 Joshua Heimbach [email protected] ha scritto:

I'm also experiencing the same issue using pdfboxing version 0.1.14.1-SNAPSHOT. Here's a more distilled example using .save instead of split/merge-documents:

(-> (pdf/split-pdf :input "/tmp/example.pdf") first (.save "/tmp/page-1.pdf")) Execution error (IOException) at org.apache.pdfbox.cos.COSStream/checkClosed (COSStream.java:82). COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?

@alanmarazzi https://github.com/alanmarazzi I know it's been a year, but were you able to resolve the issue? @aaka3207 https://github.com/aaka3207 any luck on your side?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotemacs/pdfboxing/issues/46?email_source=notifications&email_token=ADMDEQR6LKIZEF3C4ZR7XZ3P6CG27A5CNFSM4FNPPDI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZKY45Y#issuecomment-508923511, or mute the thread https://github.com/notifications/unsubscribe-auth/ADMDEQTTMTXGDZBIEWIYTZLP6CG27ANCNFSM4FNPPDIQ .

alanmarazzi avatar Jul 06 '19 13:07 alanmarazzi

Nope unfortunately. I ended up using something else

Cheers for the quick reply.

joshkh avatar Jul 06 '19 13:07 joshkh

Thank you @alanmarazzi for raising this issue, @aaka3207 & @joshkh for the nudges.

Looking at the code, it seems that this regression was introduced with the version 0.1.11, in particular this line:

https://github.com/dotemacs/pdfboxing/commit/2ed58248360788cbdd4a4b5dbc95eadaddd9c767#diff-4c66c23479d6f9f33283c8cf88c003d0R49

That change with with-open, does fix the issue of closing the file being split.

The problem that it introduced, as you've seen it already, is that those pages, which are instances of PDDocuments, can't then be used in another function as input, as the file they are in has already been closed.

The thing is, I'm not sure when somebody would use split-pdf just to view the PDDocuments. So maybe it's OK to not use with-open and not close the file being split in split-pdf, if the result of the split is to be passed into another function which will then save the desired pages, just like @joshkh was trying to do with:

(-> (pdf/split-pdf :input "/tmp/example.pdf")
  first 
  (.save "/tmp/page-1.pdf"))

I'd like to get your thoughts on this issue before I apply a fix, so please let me know.

Thanks

dotemacs avatar Jul 08 '19 19:07 dotemacs

The change that fixes this issue is here: https://github.com/dotemacs/pdfboxing/commit/71696b2ae9a91196041538deae689533d32d6e7d

Thanks

dotemacs avatar Jul 08 '19 19:07 dotemacs

@dotemacs Thanks for looking into this!

The thing is, I'm not sure when somebody would use split-pdf just to view the PDDocuments

I had the exact same thought. If users really want a collection of closed files then perhaps split-pdf could accept an option to close them, or the other way around -- an option to keep them open. Either certainly works for me!

Thanks for the great library.

joshkh avatar Jul 08 '19 19:07 joshkh

The thing is, I'm not sure when somebody would use split-pdf just to view the PDDocuments

I had the exact same thought. If users really want a collection of closed files then perhaps split-pdf could accept an option to close them, or the other way around -- an option to keep them open. Either certainly works for me!

I was thinking of having an option for specifying that the file should be closed...

But maybe just leave it as is and if somebody really needs it they can always open an issue or submit a PR?

Thanks for the great library.

Thanks for the kind words :)

dotemacs avatar Jul 08 '19 20:07 dotemacs

I'm in favour of leaving it as is (after your patch). Nothing is stopping a user from closing the files on their own accord. :)

Any chance of a release to Clojars?

joshkh avatar Jul 08 '19 20:07 joshkh

I'm in favour of leaving it as is (after your patch). Nothing is stopping a user from closing the files on their own accord. :)

True, thanks for making this easy for me :)

But maybe others have some other ideas/suggestions?

Any chance of a release to Clojars?

Since the project has been languishing a bit, I should do it. But not tonight. I'll do it tomorrow for sure, as there are some other issues to address also, such as the new pdfbox releases...

Thanks :)

dotemacs avatar Jul 08 '19 20:07 dotemacs

I've released a new version 0.1.14 on clojars with the fix for the above issue, as discussed yesterday.

Thank you @alanmarazzi, @aaka3207 & @joshkh for your patience and feedback.

dotemacs avatar Jul 09 '19 21:07 dotemacs

Hi @dotemacs

Are we sure this was fixed? I'm using the latest version from Clojars 0.1.14 but I'm having the same problem:

(-> (pdf/split-pdf :input "/tmp/my.pdf")
    first
    (.save "/tmp/page-1.pdf"))

Execution error (IOException) at org.apache.pdfbox.cos.COSStream/checkClosed (COSStream.java:83).
COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?

joshkh avatar Jul 15 '19 10:07 joshkh

Hi @dotemacs

Are we sure this was fixed? I'm using the latest version from Clojars 0.1.14 but I'm having the same problem:

(-> (pdf/split-pdf :input "/tmp/my.pdf")
    first
    (.save "/tmp/page-1.pdf"))

Execution error (IOException) at org.apache.pdfbox.cos.COSStream/checkClosed (COSStream.java:83).
COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?

I thought that it was, even with this test in place it still passes:

https://github.com/dotemacs/pdfboxing/commit/71696b2ae9a91196041538deae689533d32d6e7d#diff-fb3a651e247bd56798ada20f002042ccR38

Let me revisit this again. If for some reason you don't hear back from me, please shout.

Sorry about this

dotemacs avatar Jul 15 '19 10:07 dotemacs

No worries! I saw the fix-for-split-pdf branch, tested it, and sure enough it works. Maybe the release to Clojars was done with the master branch?

joshkh avatar Jul 15 '19 10:07 joshkh

I've made a fork here with: https://github.com/joshkh/pdfboxing

  • added deps.edn for wider tooling support
  • merged your fix-for-split-pdf into master

And when I include the latest SHA in my project it works as expected. :)

{:deps {pdfboxing {:sha     "ec1d5b06dafeb9886f2a55a1754632cfd177b824"
                   :git/url "[email protected]:joshkh/pdfboxing.git"}}}
(-> (pdf/split-pdf :input "/tmp/my.pdf")
  first
  (.save "/tmp/page-1.pdf"))
=> nil

joshkh avatar Jul 15 '19 10:07 joshkh

It looks like that might be the issue...

Thanks for looking into this and sorry about this slip up.

Feeling a bit drained, will check this first thing in the morning.

dotemacs avatar Jul 15 '19 16:07 dotemacs

Hello @joshkh @alanmarazzi @aaka3207

I've merged https://github.com/dotemacs/pdfboxing/pull/73 which should resolve this issue.

Sorry for the delay on this and thanks for your feedback.

Please let me know if you still experience the regression.

Thanks

dotemacs avatar Mar 12 '23 07:03 dotemacs