PyPDF4
PyPDF4 copied to clipboard
PdfFileWriter.write causes access to non-existent attribute in pdf.py
The attached code causes an exception when it executes the output.write(outfile)
statement at line 58. The program appears to work with PyPDF2.
The zip file also includes a data file (you20.pdf) that errors, and one that doesn't (CleanedUOSSSimpleSabotage_sm.pdf) in case this helps track down the bug. Here's the traceback from a failed attempt:
[2018-12-15 10:57:32,725] ERROR in app: Exception on / [POST]
Traceback (most recent call last):
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "app.py", line 58, in get_or_post
output.write(outfile)
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 482, in write
self._sweepIndirectReferences(externalReferenceMap, self._root)
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 572, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 572, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 557, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, data[i])
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 572, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 589, in _sweepIndirectReferences
newobj = self._sweepIndirectReferences(externMap, newobj)
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 575, in _sweepIndirectReferences
if data.pdf.stream.closed:
AttributeError: 'PdfFileWriter' object has no attribute 'stream'
Sorry about the zip file, but Github don't allow direct uploading of .py
or .tar
files.
bug_report.zip
- Yikes.
- Thanks.
- I'm backed up. It might be a few days before I look at this.
Cameron Laird, vice president We make computers work for people.
On Sat, Dec 15, 2018 at 4:08 AM Steve Holden [email protected] wrote:
The attached code causes an exception when it executes the output.write(outfile) statement at line 58. The program appears to work with PyPDF2.
The zip file also includes a data file (you20.pdf) that errors, and one that doesn't (CleanedUOSSSimpleSabotage_sm.pdf) in case this helps track down the bug. Here's the traceback from a failed attempt:
[2018-12-15 10:57:32,725] ERROR in app: Exception on / [POST] Traceback (most recent call last): File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app response = self.full_dispatch_request() File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise raise value File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request return self.view_functionsrule.endpoint File "app.py", line 58, in get_or_post output.write(outfile) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 482, in write self._sweepIndirectReferences(externalReferenceMap, self._root) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 572, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 572, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 557, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, data[i]) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 572, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 589, in _sweepIndirectReferences newobj = self._sweepIndirectReferences(externMap, newobj) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 575, in _sweepIndirectReferences if data.pdf.stream.closed: AttributeError: 'PdfFileWriter' object has no attribute 'stream'
Sorry about the zip file, but Github don't allow direct uploading of .py or .tar files. bug_report.zip https://github.com/claird/PyPDF4/files/2682558/bug_report.zip
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/claird/PyPDF4/issues/24, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbN9E7aBecSaHbEcZM1OaokPhzFNiWCks5u5Ng2gaJpZM4ZUr80 .
Hi @holdenweb, thanks for the report. As you might know, PyPDF v4 is still undergoing a (slow) restructuring phase. Apparently I've been the only one to look forward its enhancement in the last months.
One of my primary concerns was to have the codebase look much cleaner and maintanable. We are familiar with these kinds of errors, it just takes time (that at the moment I lack) to solve them.
Meanwhile PyPDF2 might be a more stable, albeit obsolete, choice.
No worries - this is just a data point about a PyPDF program I was testing before submitting a PR. Perfectly happy to continue using PyPDF2.
I have the same problem and except someone to solve it with me
Here's a workaround that worked for at least one use case. Maybe it will work for yours.
The problem seems to be that PdfFileWriter looks for a 'stream' attribute on the PdfFileWriter instance when performing some cleanup steps (_sweepIndirectReferences), and an error occurs because the PdfFileWriter class (as of 1.27.0) has no such attribute. However, that 'stream' attribute isn't referenced again in _sweepIndirectReferences.
A potentially viable workaround (until a fix for this issue is released) would be to create a wrapper class which extends PdfFileWriter with a 'stream' attribute, with its value set to an instance of BytesIO.
Use at your own risk. This is simply a workaround, which works in one case, but may or may not work for your case.
from io import BytesIO
from PyPDF4 import PdfFileWriter
class PdfFileWriterWithStreamAttribute(PdfFileWriter):
def __init__(self):
super().__init__()
self.stream = BytesIO()
Hmm, sensible workaround. Seems like the bug classification is correct!