OCRmyPDF icon indicating copy to clipboard operation
OCRmyPDF copied to clipboard

Refactor code that tramples previous variable assignment (was: possible bug)

Open kshpytsya opened this issue 2 years ago • 1 comments

I may be missing something but it seems that the value of self._has_text set in this section of code:

https://github.com/ocrmypdf/OCRmyPDF/blob/5c6030960945fe299291fa134cff35c86a644b9f/src/ocrmypdf/pdfinfo/info.py#L779-L788

is always overwritten here:

https://github.com/ocrmypdf/OCRmyPDF/blob/5c6030960945fe299291fa134cff35c86a644b9f/src/ocrmypdf/pdfinfo/info.py#L804-L822

kshpytsya avatar Jul 18 '22 19:07 kshpytsya

Kind of? This ambiguity is not intended, but it looks to me like it's going to end up doing the right thing anyway.

If the first if is executed, then we're going to get the same result in the second if for self._has_text too, unless there are bugs in either implementation. The former uses pdfminer.six to do a much slower analysis that figures out where all the text is on the page; the latter checks for presence of vector, text and images. So the result of that boolean is going to be equal in both cases.

If you have something that demonstrates a likely issue here please let me know.

jbarlow83 avatar Jul 19 '22 01:07 jbarlow83