pdfx
pdfx copied to clipboard
"URI" in PDF attributes may be a string itself
trafficstars
The URI value in an attribute object may be itself a string, instead of a PDFObjRef. Not dealing with this case would cause many URIs to be ignored. The following patch fixed the issue for me, but a better solution may be desirable:
@@ -282,16 +279,22 @@ class PDFMinerBackend(ReaderBackend):
if isinstance(obj_resolved, list):
return [self.resolve_PDFObjRef(o) for o in obj_resolved]
+ print(obj_resolved)
if "URI" in obj_resolved:
if isinstance(obj_resolved["URI"], PDFObjRef):
return self.resolve_PDFObjRef(obj_resolved["URI"])
+ elif isinstance(obj_resolved["URI"], (str, unicode)):
+ if IS_PY2:
+ ref = obj_resolved["URI"].decode("utf-8")
+ else:
+ ref = obj_resolved
+ return Reference(ref, self.curpage)
Thanks!