pyPdf icon indicating copy to clipboard operation
pyPdf copied to clipboard

Issue about the _sweepIndirectReferences function

Open Eladio opened this issue 14 years ago • 1 comments

I think there's a little problem in the PdfFileWriter class' _sweepIndirectReferences function. There's a list called self.stack where the indirect references that we've already seen are stored. I suppose that it is used so that we don't sweep the same indirect reference over and over again. However in the function after it's sweeped once it is removed from self.stack, I don't see the point of that. If there are lots of objects referencing the same object ( for example if we copy the Logical Structure of the pdf as well, many objects reference the same page object wich is quite expensive to sweep ) mantaining it in self.stack could mean significant improvement in time.

if data.pdf == self:
            if data.idnum in self.stack:
                return data
            else:
                self.stack.append(data.idnum)
                realdata = self.getObject(data)
                self._sweepIndirectReferences(externMap, realdata)
                self.stack.pop()
                return data

I think it should be:

if data.pdf == self:
            if data.idnum in self.stack:
                return data
            else:
                self.stack.append(data.idnum)
                realdata = self.getObject(data)
                self._sweepIndirectReferences(externMap, realdata)
                return data

Eladio avatar Jun 17 '11 14:06 Eladio

Hi,

This fixes an issue I was having with certain kinds of documents getting stuck in a recursive loop in the self._sweepIndirectReferences call.

AeroNotix avatar Nov 27 '11 22:11 AeroNotix