pdfx "URI" in PDF attributes may be a string itself

"URI" in PDF attributes may be a string itself

Open theiostream opened this issue 7 years ago • 1 comments

trafficstars

The URI value in an attribute object may be itself a string, instead of a PDFObjRef. Not dealing with this case would cause many URIs to be ignored. The following patch fixed the issue for me, but a better solution may be desirable:

@@ -282,16 +279,22 @@ class PDFMinerBackend(ReaderBackend):
         if isinstance(obj_resolved, list):
             return [self.resolve_PDFObjRef(o) for o in obj_resolved]

+        print(obj_resolved)
         if "URI" in obj_resolved:
             if isinstance(obj_resolved["URI"], PDFObjRef):
                 return self.resolve_PDFObjRef(obj_resolved["URI"])
+            elif isinstance(obj_resolved["URI"], (str, unicode)):
+               if IS_PY2:
+                   ref = obj_resolved["URI"].decode("utf-8")
+               else:
+                   ref = obj_resolved
+               return Reference(ref, self.curpage)

Oct 12 '18 19:10 theiostream

Thanks!

Nov 28 '18 21:11 morriscode

pdfx pdfx copied to clipboard

"URI" in PDF attributes may be a string itself

pdfx
pdfx copied to clipboard